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Abstract 

Compressive sensing is a signal acquisition framework based on the revelation that a small col- 
lection of linear projections of a sparse signal contains enough information for stable recovery. 
In this paper we introduce a new theory for distributed compressive sensing (DCS) that en- 
ables new distributed coding algorithms for multi-signal ensembles that exploit both intra- and 
inter-signal correlation structures. The DCS theory rests on a new concept that we term the 
joint sparsity of a signal ensemble. Our theoretical contribution is to characterize the funda- 
mental performance limits of DCS recovery for jointly sparse signal ensembles in the noiseless 
measurement setting; our result connects single-signal, joint, and distributed (multi-encoder) 
compressive sensing. To demonstrate the efficacy of our framework and to show that additional 
challenges such as computational tractability can be addressed, we study in detail three example 
models for jointly sparse signals. For these models, we develop practical algorithms for joint 
recovery of multiple signals from incoherent projections. In two of our three models, the results 
are asymptotically best-possible, meaning that both the upper and lower bounds match the 
performance of our practical algorithms. Moreover, simulations indicate that the asymptotics 
take effect with just a moderate number of signals. DCS is immediately applicable to a range 
of problems in sensor arrays and networks. 

Keywords: Compressive sensing, distributed source coding, sparsity, random projection, random matrix, 
linear programming, array processing, sensor networks. 

1 Introduction 



A core tenet of signal processing and information theory is that signals, images, and other data often 
contain some type of structure that enables intelhgent representation and processing. The notion 
of structure has been characterized and exploited in a variety of ways for a variety of purposes. In 
this paper, we focus on exploiting signal correlations for the purpose of compression. 
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Current state-of-the-art compression algorithms employ a decorrelating transform such as an 
exact or approximate Karhunen-Loeve transform (KLT) to compact a correlated signal's energy 
into just a few essential coefficients [5-7]. Such transform coders exploit the fact that many signals 
have a sparse representation in terms of some basis, meaning that a small number K of adaptively 
chosen transform coefficients can be transmitted or stored rather than N K signal samples. For 
example, smooth signals are sparse in the Fourier basis, and piecewise smooth signals are sparse 
in a wavelet basis [8]; the commercial coding standards MPS [9], JPEG [10], and JPEG2000 [11] 
directly exploit this sparsity. 

1.1 Distributed source coding 

While the theory and practice of compression have been well developed for individual signals, 
distributed sensing applications involve multiple signals, for which there has been less progress. 
Such settings are motivated by the proliferation of complex, multi-signal acquisition architectures, 
such as acoustic and RF sensor arrays, as well as sensor networks. These architectures sometimes 
involve battery-powered devices, which restrict the communication energy, and high aggregate data 
rates, limiting bandwidth availability; both factors make the reduction of communication critical. 

Fortunately, since the sensors presumably observe related phenomena, the ensemble of signals 
they acquire can be expected to possess some joint structure, or inter-signal correlation^ in addition 
to the intra-signal correlation within each individual sensor's measurements. In such settings, 
distributed source coding that exploits both intra- and inter-signal correlations might allow the 
network to save on the communication costs involved in exporting the ensemble of signals to the 
collection point [12-15]. A number of distributed coding algorithms have been developed that 
involve collaboration amongst the sensors [16-19]. Note, however, that any collaboration involves 
some amount of inter-sensor communication overhead. 

In the 5Zepmn- W^o// framework for lossless distributed coding [12-15], the availability of corre- 
lated side information at the decoder (collection point) enables each sensor node to communicate 
losslessly at its conditional entropy rate rather than at its individual entropy rate, as long as the sum 
rate exceeds the joint entropy rate. Slepian-Wolf coding has the distinct advantage that the sen- 
sors need not collaborate while encoding their measurements, which saves valuable communication 
overhead. Unfortunately, however, most existing coding algorithms [14, 15] exploit only inter-signal 
correlations and not intra-signal correlations. To date there has been only limited progress on 
distributed coding of so-called "sources with memory." The direct implementation for sources with 
memory would require huge lookup tables [12]. Furthermore, approaches combining pre- or post- 
processing of the data to remove intra-signal correlations combined with Slepian-Wolf coding for 
the inter-signal correlations appear to have limited applicability, because such processing would 
alter the data in a way that is unknown to other nodes. Finally, although recent papers [20-22] 
provide compression of spatially correlated sources with memory, the solution is specific to lossless 
distributed compression and cannot be readily extended to lossy compression settings. We conchide 
that the design of constructive techniques for distributed coding of sources with both intra- and 
inter-signal correlation is a challenging problem with many potential applications. 

1.2 Compressive sensing (CS) 

A new framework for single-signal sensing and compression has developed under the rubric of 
compressive sensing (CS). CS builds on the work of Candcs, Romberg, and Tao [23] and Donoho [24], 
who showed that if a signal has a sparse representation in one basis then it can be recovered from 
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a small number of projections onto a second basis that is incoherent with the first. CS relies on 
tractable recovery procedures that can provide exact recovery of a signal of length N and sparsity 
K, i.e., a signal that can be written as a sum of K basis functions from some known basis, where 
K can be orders of magnitude less than N. 

The implications of CS are promising for many applications, especially for sensing signals that 
have a sparse representation in some basis. Instead of sampling a K-sparse signal N times, only 
M = 0{K log N) incoherent measurements sufBce, where K can be orders of magnitude less than 
N. Moreover, the M measurements need not be manipulated in any way before being transmitted, 
except possibly for some quantization. Finally, independent and identically distributed (i.i.d.) 
Gaussian or Bernoulli/Rademacher (random ±1) vectors provide a useful universal basis that is 
incoherent with all others. Hence, when using a random basis, CS is universal in the sense that the 
sensor can apply the same measurement mechanism no matter what basis sparsifies the signal [27] . 

While powerful, the CS theory at present is designed mainly to exploit intra-signal structures 
at a single sensor. In a multi-sensor setting, one can naively obtain separate measurements from 
each signal and recover them separately. However, it is possible to obtain measurements that each 
depend on all signals in the ensemble by having sensors collaborate with each other in order to 
combine all of their measurements; we term this process a joint measurement setting. In fact, initial 
work in CS for multi-sensor settings used standard CS with joint measurement and recovery schemes 
that exploit inter-signal correlations [28-32]. However, by recovering sequential time instances of 
the sensed data individually, these schemes ignore intra-signal correlations. 

1.3 Distributed compressive sensing (DCS) 

In this paper we introduce a new theory for distributed compressive sensing (DCS) to enable new 
distributed coding algorithms that exploit both intra- and inter-signal correlation structures. In 
a typical DCS scenario, a number of sensors measure signals that are each individually sparse in 
some basis and also correlated from sensor to sensor. Each sensor separately encodes its signal 
by projecting it onto another, incoherent basis (such as a random one) and then transmits just a 
few of the resulting coefficients to a single collection point. Unlike the joint measurement setting 
described in Section 1.2, DCS requires no collaboration between the sensors during signal acquisi- 
tion. Nevertheless, we are able to exploit the inter-signal correlation by using all of the obtained 
measurements to recover all the signals simultaneously. Under the right conditions, a decoder at 
the collection point can recover each of the signals precisely. 

The DCS theory rests on a concept that we term the joint sparsity — the sparsity of the 
entire signal ensemble. The joint sparsity is often smaller than the aggregate over individual signal 
sparsities. Therefore, DCS offers a reduction in the number of measurements, in a manner analogous 
to the rate reduction offered by the Slepian-Wolf framework [13]. Unlike the single-signal definition 
of sparsity, however, there are numerous plausible ways in which joint sparsity could be defined. In 
this paper, we first provide a general framework for joint sparsity using algebraic formulations based 
on a graphical model. Using this framework, we derive bounds for the number of measurements 
necessary for recovery under a given signal ensemble model. Similar to Slepian-Wolf coding [13], the 
number of measurements required for each sensor must account for the minimal features unique to 
that sensor, while at the same time features that appear among multiple sensors must be amortized 
over the group. Our bounds are dependent on the dimensionality of the subspaces in which each 
group of signals reside; they afford a reduction in the number of measurements that we quantify 
through the notions of joint and conditional sparsity, which are conceptually related to joint and 

^Roughly speaking, incoherence means that no element of one basis has a sparse representation in terms of the 
other basis. This notion has a variety of formaUzations in the CS Uterature [24-26]. 



3 



conditional entropies. The common thread is that dimensionahty and entropy both quantify the 
volume that the measurement and coding rates must cover. Our results are also applicable to cases 
where the signal ensembles are measured jointly, as well as to the single-signal case. 

While our general framework does not by design provide insights for computationally efficient 
recovery, we also provide interesting models for joint sparsity where our results carry through from 
the general framework to realistic settings with low-complexity algorithms. In the first model, 
each signal is itself sparse, and so we could use CS to separately encode and decode each signal. 
However, there also exists a framework wherein a joint sparsity model for the ensemble uses fewer 
total coefficients. In the second model, all signals share the locations of the nonzero coefficients. 
In the third model, no signal is itself sparse, yet there still exists a joint sparsity among the signals 
that allows recovery from significantly fewer than N measurements per sensor. For each model 
we propose tractable algorithms for joint signal recovery, followed by theoretical and empirical 
characterizations of the number of measurements per sensor required for accurate recovery. We 
show that, under these models, joint signal recovery can recover signal ensembles from significantly 
fewer measurements than would be required to recover each signal individually. In fact, for two of 
our three models we obtain best-possible performance that could not be bettered by an oracle that 
knew the the indices of the nonzero entries of the signals. 

This paper focuses primarily on the basic task of reducing the number of measurements for 
recovery of a signal ensemble in order to reduce the communication cost of source coding that 
ensemble. Our emphasis is on noiseless measurements of strictly sparse signals, where the optimal 
recovery relies on £o-norm optimization,^ which is computationally intractable. In practical settings, 
additional criteria may be relevant for measuring performance. For example, the measurements will 
typically be real numbers that must be quantized, which gradually degrades the recovery quality as 
the quantization becomes coarser [33,34]. Characterizing DCS in light of practical considerations 
such as rate-distortion tradeoffs, power consumption in sensor networks, etc., are topics of future 
research [31,32]. 

1.4 Paper organization 

Section 2 overviews the single-signal CS theories and provides a new result on CS recovery. While 
some readers may be familiar with this material, we include it to make the paper self-contained. 
Section 3 introduces our general framework for joint sparsity models and proposes three example 
models for joint sparsity. We provide our detailed analysis for the general framework in Section 4; 
we then address the three models in Section 5. We close the paper with a discussion and conclusions 
in Section 6. Several appendices contain the proofs. 

2 Compressive Sensing Background 
2.1 Transform coding 

Consider a real-valued signal^ x G M.^ indexed as x{n), n G {1, 2, . . . ,N}. Suppose that the basis 
* = ["01) • • • ) ^^n] provides a K-sparse representation of x; that is, 

TV K 

X = ^Hn)'(pn = ^Hnk)lpnk, (1) 
n=l fc=l 

^The £o "norm" |ja;||o merely counts the number of nonzero entries in the vector x. 

^Without loss of generality, we will focus on one-dimensional signals (vectors) for notational simplicity; the exten- 
sion to multi-dimensional signal, e.g., images, is straightforward. 
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where x is a linear combination of K vectors chosen from ^, {n^} are the indices of those vectors, 
and {'d{n)} are the coefficients; the concept is extendable to tight frames [8]. Alternatively, we can 
write in matrix notation x = ^'i?, where a; is an AT x 1 column vector, the sparse basis matrix ^ 
is N X N with the basis vectors as columns, and is an x 1 column vector with K nonzero 
elements. Using || • ||p to denote the ip norm, we can write that ||'!?||o = K; we can also write the 
set of nonzero indices C {1, . . . , A"}, with \^}\ = K. Various expansions, including wavelets [8], 
Gabor bases [8], curvelets [35], etc., are widely used for representation and compression of natural 
signals, images, and other data. 

The standard procedure for compressing sparse and nearly-sparse signals, known as transform 
coding, is to (i) acquire the full A'-sample signal x\ {ii) compute the complete set of transform 
coefficients {d{n)}] {Hi) locate the K largest, significant coefficients and discard the (many) small 
coefficients; {iv) encode the values and locations of the largest coefficients. This procedure has three 
inherent inefficiencies: First, for a high- dimensional signal, we must start with a large number of 
samples A^. Second, the encoder must compute all of the A" transform coefficients {t?(?7.)}, even 
though it will discard all but K of them. Third, the encoder must encode the locations of the large 
coefficients, which requires increasing the coding rate since the locations change with each signal. 

We will focus our theoretical development on exactly AT-sparse signals and defer discussion of 
the more general situation of compressible signals where the coefficients decay rapidly with a power 
law but not to zero. Section 6 contains additional discussion on real-world compressible signals, 
and [36] presents simulation results. 

2.2 Incoherent projections 

These inefficiencies raise a simple question: For a given signal, is it possible to directly estimate 
the set of large t?(n)'s that will not be discarded? While this seems improbable, Candes, Romberg, 
and Tao [23, 25] and Donoho [24] have shown that a reduced set of projections can contain enough 
information to recover sparse signals. A framework to acquire sparse signals, often referred to as 
compressive sensing (CS) [37], has emerged that builds on this principle. 

In CS, we do not measure or encode the K significant '&{n) directly. Rather, we measure 
and encode M < N projections y{m) = of the signal onto a second set of functions 

{^m}; m = 1, 2, . . . , M, where cf)^ denotes the transpose of (l)m and (-, •) denotes the inner product. 
In matrix notation, we measure y = ^x, where y is an M x 1 column vector and the measurement 
matrix ^ is M x N with each row a measurement vector cpm- Since M < N, recovery of the signal 
X from the measurements y is ill-posed in general; however the additional assumption of signal 
sparsity makes recovery possible and practical. 

The CS theory tells us that when certain conditions hold, namely that the basis {i/^n} cannot 
sparsely represent the vectors {(j)m} (a condition known as incoherence [24-26]) and the number 
of measurements M is large enough (proportional to K), then it is indeed possible to recover the 
set of large {'d{n)} (and thus the signal x) from the set of measurements {y(m)} [24,25]. This 
incoherence property holds for many pairs of bases, including for example, delta spikes and the sine 
waves of a Fourier basis, or the Fourier basis and wavelets. Signals that are sparsely represented 
in frames or unions of bases can be recovered from incoherent measurements in the same fashion. 
Significantly, this incoherence also holds with high probability between any arbitrary fixed basis 
or frame and a randomly generated one. In the sequel, we will focus our analysis to such random 
measurement procedures. 
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2.3 Signal recovery via ^o-norm minimization 

The recovery of the sparse set of significant coefficients {i9(n)} can be achieved using optimization 
by searching for the signal with the sparsest coefficient vector {'&{n)} that agrees with the M 
observed measurements in y (recall that M < N). Recovery relies on the key observation that, 
under mild conditions on $ and the coefficient vector ■& is the unique solution to the ^o-norm 
minimization 

i9 = argmin ||i9||o s.t. y = (2) 

with overwhelming probability. (Thanks to the incoherence between the two bases, if the original 
signal is sparse in the •& coefficients, then no other set of sparse signal coefficients i?' can yield the 
same projections y.) 

In principle, remarkably few incoherent measurements are required to recover a X-sparse signal 
via ^Q-norm minimization. More than K measurements must be taken to avoid ambiguity; the 
following theorem, proven in Appendix A, establishes that K + 1 random measurements will suffice. 
Similar results were established by Venkataramani and Bresler [38] . 

Theorem 1 Let ^ he an orthonormal basis for M^, and let 1 < K < N. Then: 

1. Let $ be an M X N measurement matrix with i.i.d. Gaussian entries with M > 2K. Then all 
signals x = ^-i? having expansion coefficients i? G that satisfy \\^\\o = K can be recovered 
uniquely from the M -dimensional measurement vector y = $x via the ig-norm minimization 

(2) with probability one over $. 

2. Let x = ^'-i? such that ||i?||o = K. Let <1> be an M x N measurement matrix with i.i.d. 
Gaussian entries ( notably, independent of x) with M > K + 1. Then x can be recovered 
uniquely from the M -dimensional measurement vector y = $x via the ig-norm minimization 
(2) with probability one over 

3. Let $ be an M X N measurement matrix, where M < K. Then, aside from pathological cases 
(specified in the proof), no signal x = ^'i? with W'&Wo = K can be uniquely recovered from the 
M -dimensional measurement vector y = ^x. 

Remark 1 The second statement of the theorem differs from the first in the following respect: when 
K < M < 2K, there will necessarily exist K -sparse signals x that cannot be uniquely recovered from 

the M -dimensional measurement vector y = 4>,t. However, these signals form, a set of measure zero 
within the set o/all K -sparse signals and can safely he avoided with high probability if^ is randomly 
generated independently of x. 

Comparing the second and third statements of Theorem 1, we see that one measurement 
separates the achievable region, where perfect recovery is possible with probability one, from the 
converse region, where with overwhelming probability recovery is impossible. Moreover, Theorem 1 
provides a strong converse measurement region in a manner analogous to the strong channel coding 
converse theorems of information theory [12]. 

Unfortunately, solving the ^Q-norm minimization problem is prohibitively complex, requiring a 
combinatorial enumeration of the (^) possible sparse subspaces. In fact, the £o-noTui minimization 
problem in general is known to be NP-hard [39] . Yet another challenge is robustness; in the setting 
of Theorem 1, the recovery may be very poorly conditioned. In fact, both of these considerations 
(computational complexity and robustness) can be addressed, but at the expense of slightly more 
measurements. 
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2.4 Signal recovery via £i-norm minimization 

The practical revelation that supports the new CS theory is that it is not necessary to solve the 
io-norm minimization to recover the set of significant {i?(n)}. In fact, a much easier problem yields 
an equivalent solution (thanks again to the incoherence of the bases); we need only solve for the 
smallest £i-norm coefficient vector that agrees with the measurements y [24, 25] : 

?= argmin s.t. y = (3) 

This optimization problem, also known as Basis Pursuit, is significantly more approachable and 
can be solved with traditional linear programming techniques whose computational complexities 
are polynomial in N. 

There is no free lunch, however; according to the theory, more than K + 1 measurements 

are required in order to recover sparse signals via Basis Pursuit. Instead, one typically requires 
M > cK measurements, where c > 1 is an overmeasuring factor. As an example, we quote a 
result asymptotic in N. For simplicity, we assume that the sparsity scales linearly with N; that is, 
K = SN, where we call S the sparsity rate. 

Theorem 2 [39-41] Set K = SN with < S* <C 1. Then there exists an overmeasuring fac- 
tor c{S) = 0(log(l/5)), c{S) > 1, such that, for a K-sparse signal x in basis ^, the following 
statements hold: 

1. The probability of recovering x via (.\-norm minimization from {c{S)+e)K random projections, 
e > 0, converges to one as N ^ oo. 

2. The probability of recovering x via li-norm minimization from {c{S)—e)K random projections, 
e > 0, converges to zero as N ^ oo. 

In an illuminating series of papers, Donoho and Tanner [40-42] have characterized the over- 
measuring factor c{S) precisely. In our work, wc have noticed that the overmeasuring factor is quite 
similar to log2(l + 5^^). We find this expression a useful rule of thumb to approximate the precise 
overmeasuring ratio. Additional overmeasuring is proven to provide robustness to measurement 
noise and quantization error [25]. 

Throughout this paper we use the abbreviated notation c to describe the overmeasuring factor 
required in various settings even though c{S) depends on the sparsity K and signal length A''. 

2.5 Signal recovery via greedy pursuit 

Iterative greedy algorithms have also been developed to recover the signal .r from the measurements 
y. The Orthogonal Matching Pursuit (OMP) algorithm, for example, itcrativcly selects the vectors 
from the matrix <I>^' that contain most of the energy of the measurement vector y. The selection 
at each iteration is made based on inner products between the columns of and a residual; the 
residual reflects the component of y that is orthogonal to the previously selected columns. The 
algorithm has been proven to successfully recover the acquired signal from incoherent measurements 
with high probability, at the expense of slightly more measurements, [26, 43]. Algorithms inspired by 
OMP, such as regularized orthogonal matching pursuit [44], CoSaMP [45], and Subspace Pursuit [46] 
have been shown to attain similar guarantees to those of their optimization-based counterparts. In 
the following, we will exploit both Basis Pursuit and greedy algorithms for recovering jointly sparse 
signals from incoherent measurements. 
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2.6 Properties of random measurements 

In addition to offering substantially reduced measurement rates, CS lias many attractive and in- 
triguing properties, particularly when we employ random projections at the sensors. Random 
measurements are universal in the sense that any sparse basis can be used, allowing the same en- 
coding strategy to be applied in different sensing environments. Random measurements are also 
future-proof, if a better sparsity-inducing basis is found for the signals, then the same measurements 
can be used to recover a more accurate view of the environment. Random coding is also robust: the 
measurements coming from each sensor have equal priority, unlike Fourier or wavelet coefficients in 
current coders. Finally, random measurements allow a progressively better recovery of the data as 
more measurements are obtained; one or more measurements can also be lost without corrupting 
the entire recovery. 

2.7 Related work 

Several researchers have formulated joint measurement settings for CS in sensor networks that 
exploit inter-signal correlations [28-32]. In their approaches, each sensor n G {1,2, .. . ,N} simul- 
taneously records a single reading x{n) of some spatial field (temperature at a certain time, for 
example).^ Each of the sensors generates a pseudorandom sequence r„(m), m = 1, 2, . . . , M, and 
modulates the reading as a;(n)r„(m). Each sensor n then transmits its M numbers in sequence to 
the collection point where the measurements are aggregated, obtaining M measurements y{m) = 
X^^^i x(n)r„(m). Thus, defining x = [x{l),x{2), . . . ,x{N)]'^ and (pm = [ri(m), r2(m), . . . ,rAr(m)], 
the collection point automatically receives the measurement vector y = [y(l), j/(2), . . . ,y(M)]^ = 
after M transmission steps. The samples x{n) of the spatial field can then be recovered using 
CS provided that x has a sparse representation in a known basis. These methods have a major 
limitation: since they operate at a single time instant, they exploit only inter-signal and not intra- 
signal correlations; that is, they essentially assume that the sensor field is i.i.d. from time instant 
to time instant. In contrast, we will develop signal models and algorithms that are agnostic to the 
spatial sampling structure and that exploit both inter- and intra-signal correlations. 

Recent work has adapted DCS to the finite rate of innovation signal acquisition framework [47] 
and to the continuous-time setting [48]. Since the original submission of this paper, additional work 
has focused on the analysis and proposal of recovery algorithms for jointly sparse signals [49, 50]. 

3 Joint Sparsity Signal Models 

In this section, we generalize the notion of a signal being sparse in some basis to the notion of an 
ensemble of signals being jointly sparse. 

3.1 Notation 

We will use the following notation for signal ensembles and our measurement model. Let A := 

{1, 2, . . . , J} denote the set of indices for the J signals in the ensemble. Denote the signals in the 
ensemble by xj , with j G A and assume that each signal xj G . We use xj (n) to denote sample 
n in signal j, and assume for the sake of illustration — but without loss of generality — that these 
signals are sparse in the canonical basis, i.e., * = I. The entries of the signal can take arbitrary 
real values. 

We denote by $j the measurement matrix for signal j; is Mj x and, in general, the 
entries of $j arc different for each j. Thus, yj = ^jxj consists of Mj < N random measurements 

*Note that in Section 2.7 only, A'^ refers to the number of sensors, since each sensor acquires a signal sample. 
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of Xj. We will emphasize random i.i.d. Gaussian matrices $j in the following, but other schemes 
are possible, including random ±1 Bernoulli/Rademacher matrices, and so on. 

To compactly represent the signal and measurement ensembles, we denote M = J2j£A ^^'^ 



define X G 



p,/iV 



Y e 



and $ G 



oMxJN 



as 



X = 



Xl 




yi 


X2 




y2 




, Y = 






. yj . 



and $ = 



$1 ... 
$2 ••• 

... $j 



(4) 



with denoting a matrix of appropriate size with all entries equal to 0. We then have Y = 
^X. Equation (4) shows that separate measurement matrices have a characteristic block-diagonal 
structure when the entries of the sparse vector are grouped by signal. 

Below we propose a general framework for joint sparsity models (JSMs) and three example 
JSMs that apply in different situations. 



3.2 General framework for joint sparsity 

We now propose a general framework to quantify the sparsity of an ensemble of correlated signals 
xi,X2, ■ ■ ■ ,xj, which allows us to compare the complexities of different signal ensembles and to 
quantify their measurement requirements. The framework is based on a factored representation of 
the signal ensemble that decouples its location and value information. 

To motivate this factored representation, we begin by examining the structure of a single sparse 
signal, where x G with K <^ N nonzero entries. As an alternative to the notation used in (1), 
we can decouple the location and value information in x by writing x = P6, where 9 G contains 
only the nonzero entries of x, and P is an identity submatrix, i.e., P contains K columns of the 
N X N identity matrix I. Any X-sparse signal can be written in similar fashion. To model the 
set of all possible sparse signals, we can then let V be the set of all identity submatriccs of all 
possible sizes N x K', with 1 < K' < N. We refer to P as a sparsity model. Whether a signal 
is sufficiently sparse is defined in the context of this model: given a signal x, one can consider all 
possible factorizations x = P9 with P V. Among these factorizations, the unique representation 
with smallest dimensionality for 9 equals the sparsity level of the signal x under the model V. 

In the signal ensemble case, we consider factorizations of the form X = PQ where X G M"'^ 
as above, P G M'^'^^'', and O G M'^ for various integers 5. We refer to P and as the location 
matrix and value vector, respectively. A joint sparsity model (JSM) is defined in terms of a set V 
of admissible location matrices P with varying numbers of columns; we specify below additional 
conditions that the matrices P must satisfy for each model. For a given ensemble X, we let 
Vf{X) C V denote the set of feasible location matrices P € V for which a factorization X = PQ 
exists. We define the joint sparsity level of the signal ensemble as follows. 



Definition 1 The joint sparsity level D of the signal ensemble X is the number of columns of the 
smallest matrix P G Vf{X). 

In contrast to the single-signal case, there are several natural choices for what matrices P 
should be members of a joint sparsity model V. We restrict our attention in the sequel to what 
we call common/innovation component JSMs. In these models each signal Xj is generated as a 
combination of two components: (i) a common component zc, which is present in all signals, and 
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(a) an innovation component Zj, which is unique to each signal. These combine additively, giving 



zc + zj, j e A. 



Note, however, that the individual components might be zero-valued in specific scenarios. We can 
express the component signals as 



zc = Pcdc, Zj = p. 



3"3' 



3 e A, 



where 9c € R^'^ and each 9j G M^^' have nonzero entries. Each matrix P E V that can express 
such signals {xj} has the form 



Pc Pi 

Pc P2 








Pc ... Pj 



(5) 



where Pc, {Pj}jgA are identity submatrices. We define the value vector as = [6^ 9^02 • ■ ■ ^'jV ^ 
where 9c G M-^c: and each 9j G R^^ , to obtain X = PQ. Although the values of Kc and Kj 
are dependent on the matrix P, we omit this dependency in the sequel for brevity, except when 
necessary for clarity. 

If a signal ensemble X = P@, 9 G M'^ were to be generated by a selection of Pc and {Pj}j^A, 
where all J + 1 identity submatrices share a common column vector, then P would not be full rank. 
In other cases, we may observe a vector Q that has zero- valued entries; i.e., we may have 9j{k) = 
for some 1 < k < Kj and some j G A, or 9c{k) = for some 1 < A; < Kc- In both of these cases, 
by removing one instance of this column from any of the identity submatrices, one can obtain a 
matrix Q with fewer columns for which there exists B' G M.^~^ that gives X = QQ'. If Q E V, 
then we term this phenomenon sparsity reduction. Sparsity reduction, when present, reduces the 
effective joint sparsity of a signal ensemble. As an example of sparsity reduction, consider J = 2 
signals of length N = 2. Consider the coefficient zc{'^) 7^ of the common component zc and the 
corresponding innovation coefficients zi{l),Z2{^) 7^ 0. Suppose that all other coefficients are zero. 
The location matrix P that arises is 



P = 



1 1 



1 1 




The span of this location matrix (i.e., the set of signal ensembles X that it can generate) remains 
unchanged if we remove any one of the columns, i.e., if we drop any entry of the value vector G. 
This provides us with a lower-dimensional representation 0' of the same signal ensemble X under 
the JSM V; the joint sparsity of X is D = 2. 

3.3 Example joint spEirsity models 

Since different real- world scenarios lead to different forms of correlation within an ensemble of sparse 
signals, we consider several possible designs for a JSM V. The distinctions among our three JSMs 
concern the differing sparsity assumptions regarding the common and innovation components. 
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3.3.1 JSM-1: Sparse common component + innovations 

In this model, we suppose that each signal contains a common component zc that is sparse plus 
an innovation component Zj that is also sparse. Thus, this joint sparsity model (JSM-1) V is 
represented by the set of all matrices of the form (5) with Kc and all Kj smaller than N. Assuming 
that sparsity reduction is not possible, the joint sparsity D = Kc + YljeA^r 

A practical situation well-modeled by this framework is a group of sensors measuring temper- 
atures at a number of outdoor locations throughout the day. The temperature readings xj have 
both temporal (intra-signal) and spatial (inter-signal) correlations. Global factors, such as the sun 
and prevailing winds, could have an effect zc that is both common to all sensors and structured 
enough to permit sparse representation. More local factors, such as shade, water, or animals, could 
contribute localized innovations Zj that are also structured (and hence sparse). A similar scenario 
could be imagined for a network of sensors recording light intensities, air pressure, or other phenom- 
ena. All of these scenarios correspond to measuring properties of physical processes that change 
smoothly in time and in space and thus are highly correlated [51, 52]. 

3.3.2 JSM-2: Common sparse supports 

In this model, the common component zc is equal to zero, each innovation component Zj is sparse, 
and the innovations {zj} share the same sparse support but have different nonzero coefficients. To 
formalize this setting in a joint sparsity model (JSM-2) wc let V represent the set of all matrices 
of the form (5), where Pc = and Pj = P for all j G A. Here P denotes an arbitrary identity 
submatrix of size N x K, with K <^ N. For a given X = P@, we may again partition the value 
vector Q = [9-[9j ... ^J]^, where each 6j G M^. It is easy to see that the matrices P from JSM-2 
are full rank. Therefore, when sparsity reduction is not possible, the joint sparsity D = JK. 

The JSM-2 model is immediately applicable to acoustic and RF sensor arrays, where each 
sensor acquires a replica of the same Fourier-sparse signal but with phase shifts and attenuations 
caused by signal propagation. In this case, it is critical to recover each one of the sensed signals. 
Another useful application for this framework is MIMO communication [53] . 

Similar signal models have been considered in the area of simultaneous sparse approxima- 
tion [53-55]. In this setting, a collection of sparse signals share the same expansion vectors from 
a redundant dictionary. The sparse approximation can be recovered via greedy algorithms such as 
Simultaneous Orthogonal Matching Pursuit (SOMP) [53, 54] or MMV Order Recursive Matching 
Pursuit (M-ORMP) [55]. We use the SOMP algorithm in our setting (Section 5.2) to recover from 
incoherent measurements an ensemble of signals sharing a common sparse structure. 

3.3.3 JSM-3: Nonsparse common component -|- sparse innovations 

In this model, we suppose that each signal contains an arbitrary common component zc and a 
sparse innovation component Zj] this model extends JSM-1 by relaxing the assumption that the 
common component zc has a sparse representation. To formalize this setting in the JSM-3 model, 
we let V represent the set of all matrices (5) in which Pc = /, the N x N identity matrix. This 
implies each Kj is smaller than N while Kc = N; thus, we obtain 9c G and 9j G . 
Assuming that sparsity reduction is not possible, the joint sparsity D = N -{- ^j^^^Kj. We also 
consider the specific case where the supports of the innovations are shared by all signals, which 
extends JSM-2; in this case we will have Pj = P for all j G A, with P an identity submatrix of size 
N X K. It is easy to see that in this case sparsity reduction is possible, and so the the joint sparsity 
can drop to D = N + (J — 'l)K. Note that separate CS recovery is impossible in JSM-3 with any 
fewer than N measurements per sensor, since the common component is not sparse. However, we 
will demonstrate that joint CS recovery can indeed exploit the common structure. 
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A practical situation well-modeled by this framework is where several sources are recorded by 
different sensors together with a background signal that is not sparse in any basis. Consider, for 
example, a verification system in a component production plant, where cameras acquire snapshots 
of each component to check for manufacturing defects. While each image could be extremely 
complicated, and hence nonsparsc, the ensemble of images will be highly correlated, since each 
camera is observing the same device with minor (sparse) variations. 

JSM-3 can also be applied in non-distributed scenarios. For example, it motivates the compres- 
sion of data such as video, where the innovations or differences between video frames may be sparse, 
even though a single frame may not be very sparse. In this case, JSM-3 suggests that we encode 
each video frame separately using CS and then decode all frames of the video sequence jointly. This 
has the advantage of moving the bulk of the computational complexity to the video decoder. The 
PRISM system proposes a similar scheme based on Wyner-Ziv distributed encoding [56]. 

There are many possible joint sparsity models beyond those introdTiccd above, as well as beyond 
the common and innovation component signal model. Further work will yield new JSMs suitable 
for other application scenarios; an example application consists of multiple cameras taking digital 
photos of a common scene from various angles [57]. Extensions are discussed in Section 6. 

4 Theoretical Bounds on Measurement Rates 

In this section, we seek conditions on = (Mi, M2, . . . , Mj), the tuple of number of measurements 
from each sensor, such that we can guarantee perfect recovery of X given Y . To this end, we 
provide a graphical model for the general framework provided in Section 3.2. This graphical model 
is fundamental in the derivation of the number of measurements needed for each sensor, as well 
as in the formulation of a combinatorial recovery procedure. Thus, we generalize Theorem 1 to 
the distributed setting to obtain fundamental limits on the number of measurements that enable 
recovery of sparse signal ensembles. 

Based on the models presented in Section 3, recovering X requires determining a value vector 
and location matrix P such that X = PQ. Two challenges immediately present themselves. First, 
a given measurement depends only on some of the components of 0, and the measurement budget 
should be adjusted between the sensors according to the information that can be gathered on the 
components of Q. For example, if a component Q{d) does not affect any signal coefficient Xj{-) in 
sensor j, then the corresponding measurements yj provide no information about Q(d). Second, the 
decoder must identify a location matrix P G Vp{X) from the set V and the measurements Y. 

4.1 Modeling dependencies using bipartite graphs 

We introduce a graphical representation that captures the dependencies between the measurements 
in Y and the value vector Q, represented by $ and P. Consider a feasible decomposition of X 
into a full-rank matrix P G Vf{X) and the corresponding 6; the matrix P defines the sparsities 
of the common and innovation components Kc and Kj, 1 < i < J, as well as the joint sparsity 
D = Kc + 'n^j=i^j- Define the following sets of vertices: {i) the set of value vertices Vy has 
elements with indices d G {1, . . . , D} representing the entries of the value vector 0(ci), and {ii) the 
set of measurement vertices Vm has elements with indices {j,m) representing the measurements 
yj{m), with j G A and m G {1, . . . ,Mj}. The cardinalities for these sets are \Vv\ = D and 
\Vm\ = Af, respectively. 

We now introduce a bipartite graph G = {Vv-, Vm, -E), that represents the relationships between 
the entries of the value vector and the measurements (see [4] for details). The set of edges E is 
defined as follows: 
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Figure 1: Bipartite graph for distributed compressive sensing (DCS). The bipartite graph G — {Vv , Vm,E) 
indicates the relationship between the value vector coefficients and the measurements. 

• For every d G {1, 2, . . . , Kc} ^ Vy and j G A such that column d of Pc does not also appear 
as a column of Pj , we have an edge connecting d to each vertex (j, m) G Vm for 1 < m < Mj . 

• For every d G {Kc + 1, Kc + 2, . . . , D} C Vy, we consider the sensor j associated with column 
d of P, and we have an edge connecting d to each vertex (j, m) G Vm for 1 < m < Mj. 

In words, we say that yj{m), the m*'^ measurement of sensor j, measures Q{d) if the vertex d G V\/ 
is linked to the vertex (j, m) G V^f in the graph G. An example graph for a distributed sensing 
setting is shown in Figure 1. 

4.2 Quantifying redundancies 

In order to obtain sharp bounds on the number of measurements needed, our analysis of the mea- 
surement process must account for redundancies between the locations of the nonzero coefficients in 
the common and innovation components. To that end, we consider the overlaps between common 
and innovation components in each signal. When we have Zc{n) ^ and Zj{n) ^ for a certain 
signal j and some index 1 < n < A^, we cannot recover the values of both coefficients from the 
measurements of this signal alone; therefore, we will need to recover Zc{n) using measurements of 
other signals that do not feature the same overlap. We thus quantify the size of the overlap for 
all subsets of signals F C A under a feasible representation given by P and O, as described in 
Section 3.2. 

Definition 2 The overlap size for the set of signals F C A, denoted Kc{T,P), is the number of 
indices in which there is overlap between the common and the innovation component supports at all 
signals j ^T: 



We also define Kc{A, P) = Kc{P) and Kc{$, P) = 0. 

For F C A, Kc{T, P) provides a penalty term due to the need for recovery of common component 
coefficients that are overlapped by innovations in all other signals j ^ F. Intuitively, for each entry 
counted in KciT, P), some sensor in F must take one measurement to account for that entry of the 
common component — it is impossible to recover such entries from measurements made by sensors 
outside of F. When all signals j G A are considered, it is clear that all of the common component 
coefficients must be recovered from the obtained measurements. 



Kc{r, P) = \{ne{h...,N}: zc{n) + ^and^ T, (n) + 0}| . 



(6) 
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4.3 Measurement bounds 



Converse and achievable bounds for the number of measurements necessary for DCS recovery are 
given below. Our bounds consider each subset of sensors T C A, since the cost of sensing the 
common component can be amortized across sensors: it may be possible to reduce the rate at one 
sensor j'l € T (up to a point), as long as other sensors in T offset the rate reduction. We quantify 
the reduction possible through the following definition. 

Definition 3 The conditional sparsity of the set of signals T is the number of entries of the vector 
@ that must be recovered by measurements yj, j G T; 

i^cond(r,P)= (^K^iP)j +Kc{T,P). 

The joint sparsity gives the number of degrees of freedom for the signals in A, while the conditional 
sparsity gives the number of degrees of freedom for signals in F when the signals in A \ T are 
available as side information. Note also that Definition 1 for joint sparsity can be extended to a 
subset of signals T by considering the number of entries of @ that affect these signals: 

i^joint(r, p) = D- iCcond(A - r, P) = KjiP)^ + KciP) - Kc{A \ r, P). 

Note that -fCcond(A, P) = i^joint(A, P) = D. 

The bipartite graph introduced in Section 4.1 is the cornerstone of Theorems 3, 4, and 5, which 
consider whether a perfect matching can be found in the graph; see the proofs in Appendices B, D, 
and E, respectively, for detail. 

Theorem 3 (Achievable, known P) Assume that a signal ensemble X is obtained from a com- 
mon/innovation component JSMV. Let M = (Mi, M2, . . . , Mj) be a measurement tuple, let 
{$j}jgA be random matrices having Mj rows of i.i.d. Gaussian entries for each j G A, and write 
Y = ^X. Suppose there exists a full rank location matrix P £ Vf{X) such that 

5^Mj >ireond(r,P) (7) 

for all r C A. Then with probability one over {^j}j<z.Y, there exists a unique solution G to the system 
of equations Y = $P0; hence, the signal ensemble X can be uniquely recovered as X = PQ. 

Theorem 4 (Achievable, unknown P) Assume that a signal ensemble X and measurement matri- 
ces {<I>j}jgA follow the assumptions of Theorem 3. Suppose there exists a full rank location matrix 
P* € Vf{X) such that _ 

>^cond(r,P*) + |r| (8) 

for all r C A. Then X can be uniquely recovered from Y with probability one over {$j}jgr- 



Theorem 5 (Converse) Assume that a signal ensemble X and measurement matrices 

follow the assumptions of Theorem 3. Suppose there exists a full rank location matrix P G VpiX) 

such that 

J^Mj <K^nd{r,P) (9) 
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for some F C A. Then there exists a solution 6 such that Y = <^PQ hut X := PQ ^ X. 

The identification of a feasible location matrix P causes the one measurement per sensor gap that 
prevents (8)-(9) from being a tight converse and achievable bound pair. Wc note in passing that the 
signal recovery procedure used in Theorem 4 is akin to ^o-^orm minimization on X\ see Appendix D 
for details. 

4.4 Discussion 

The bounds in Theorems 3-5 are dependent on the dimensionality of the subspaces in which the 
signals reside. The number of noiseless measurements required for ensemble recovery is determined 
by the dimensionality dim(>S) of the subspace S in the relevant signal model, because dimensionality 
and sparsity play a volumetric role akin to the entropy H used to characterize rates in source 
coding. Whereas in source coding each bit resolves between two options, and 2^^ typical inputs 
are described using NH bits [12], in CS we have M = dim(5) + 0(1). Similar to Slepian-Wolf 
coding [13], the number of measurements required for each sensor must account for the minimal 
features unique to that sensor, while at the same time features that appear among multiple sensors 
must be amortized over the group. 

Theorems 3-5 can also be applied to the single sensor and joint measurement settings. In the 
single-signal setting (Theorem 1), we will have x = P9 with 6 G R^, and A = {1}; Theorem 4 
provides the requirement M > K + 1. It is easy to show that the joint measurement is equivalent 
to the single-signal setting: we stack all the individual signals into a single signal vector, and in 
both cases all measurements are dependent on all the entries of the signal vector. However, the 
distribution of the measurements among the available sensors is irrelevant in a joint measurement 
setting. Therefore, we only obtain a necessary condition Mj > D -|- 1 on the total number of 
measurements required. 

5 Practical Recovery Algorithms and Experiments 

Although we have provided a unifying theoretical treatment for the three JSM models, the nuances 
warrant further study. In particular, while Theorem 4 highlights the basic tradeoffs that must 
be made in partitioning the measurement budget among sensors, the result does not by design 
provide insight into tractable algorithms for signal recovery. We believe there is additional insight 
to be gained by considering each model in turn, and while the presentation may be less unified, we 
attribute this to the fundamental diversity of problems that can arise under the umbrella of jointly 
sparse signal representations. In this section, we focus on tractable recovery algorithms for each 
model and, when possible, analyze the corresponding measurement requirements. 

5.1 Recovery strategies for sparse common -|- innovations (JSM-1) 

We first characterize the sparse common signal and innovations model JSM-1 from Section 3.3.1. 
For simplicity, we limit our description to J = 2 signals, but describe extensions to multiple signals 
as needed. 

5.1.1 Measurement bounds for joint recovery 

Under the JSM-1 model, separate recovery of the signal xj via ^o-norm minimization would require 
i^joint({i}) + 1 = K^ondiij}) + l = Kc + Kj-Kc{A\ {j}) + 1 measurements, where Kc{A \ {j}) 
accounts for sparsity reduction due to overlap between zc and zj . We apply Theorem 4 to the JSM- 
1 model to obtain the corollary below. To address the possibility of sparsity reduction, we denote 
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by Kji the number of indices in which the common component zq and ah innovation components 
Zj, J G A overlap; this results in sparsity reduction for the common component. 



Corollary 1 Assume the measurement matrices {<I>j}jgA contain, i.i.d. Gaussian entries. Then 
the signal ensemble X can be recovered with probability one if the following conditions hold: 

+ Kc{r) + \T\, r^A, 




jeA yeA J 

Our joint recovery scheme provides a significant savings in measurements, because the common 
component can be measured as part of any of the J signals. 

5.1.2 Stochastic signal model for JSM-1 

To give ourselves a firm footing for analysis, in the remainder of Section 5.1 we use a stochastic 
process for JSM-1 signal generation. This framework provides an information theoretic setting 
where we can scale the size of the problem and investigate which measurement rates enable recovery. 
We generate the common and innovation components as follows. For n € {1, . . . , N} the decision 
whether zc{n) and Zj{n) is zero or not is an i.i.d. Bernoulli process, where the probability of a 
nonzero value is given by parameters denoted Sc and Sj, respectively. The values of the nonzero 
coefficients are then generated from an i.i.d. Gaussian distribution. The outcome of this process is 
that zc and Zj have sparsities Kc ~ Binomial (A*", S'c) and Kj ^ Binomial (A/", 6*^). The parameters 
Sj and Sc are sparsity rates controlling the random generation of each signal. Our model resembles 
the Gaussian spike process [58] , which is a limiting case of a Gaussian mixture model. 

Likelihood of sparsity reduction and overlap: This stochastic model can yield signal 
ensembles for which the corresponding generating matrices P allow for sparsity reduction; specif- 
ically, there might be overlap between the supports of the common component zc and all the 
innovation components Zj, j G A. For J = 2, the probability that a given index is present in 
all supports is Sji := ScSiS2- Therefore, the distribution of the cardinality of this overlap is 
Kji ^ Binomial (A^, ^ij). We must account for the reduction obtained from the removal of the 
corresponding number of columns from the location matrix P when the total number of mea- 
surements Ml -I- M2 is considered. In the same way we can show that the distributions for the 
number of indices in the overlaps required by Corollary 1 arc Kc{{l}) ~ Binomial(A^, S'(^^{i}) and 
Kc{{2}) ~ Binomial(iV, Sc,{2}), where S'c,{i} := 5'c(l - 'S'i)S'2 and Sc,{2} ■= ScSi{l - S2). 

Measurement rate region: To characterize DCS recovery performance, we introduce a 
measurement rate region. We define the measurement rate Rj in an asymptotic manner as 



Rj := hm — ? G A. 



Additionally, we note that 



Kc 

lim — — = Sc and lim — f = Sj, ? G A 



Thus, we also set Sxj = Sc + Sj — ScSj, j G {1,2}. For a measurement rate pair (i?i,i?2) and 
sources Xi and X2, we evaluate whether we can recover the signals with vanishing probability of 
error as N increases. In this case, we say that the measurement rate pair is achievable. 
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For jointly sparse signals under JSM-1, separate recovery via £Q-novm minimization would 
require a measurement rate Rj = Sxj ■ Separate recovery via £i-norm minimization would require 
an overmeasuring factor c{Sx ) , and thus the measurement rate would become Rj = Sx •c(5'x ) . To 
improve upon these figures, we adapt the standard machinery of CS to the joint recovery problem. 

5.1.3 Joint recovery via ^i-norm minimization 

As discussed in Section 2.3, solving an £o-norm minimization is NP-hard, and so in practice we 
must relax our £o criterion in order to make the solution tractable. We now study what penalty 
must be paid for ^i-norm recovery of jointly sparse signals. Using the vector and frame 



Z := 



zc 

Zl 



and $ := 



$1 $1 

$2 $2 



(11) 



we can represent the concatenated measurement vector Y sparsely using the concatenated coefficient 
vector Z, which contains Kc^Ki^K^ — K^i nonzero coefficients, to obtain Y = ^Z. With sufficient 
overmeasuring, we have seen experimentally that it is possible to recover a vector Z, which yields 



= zc + Zj, j = 1,2, by solving the weighted ^i-norm minimization 

Z = argmin7c||2c||i +7i||-2i||i +72||-Z2||i s.t. y = ^Z, 



(12) 



where JCt1i,12 > 0. We call this the ^-weighted ii-norm formulation; our numerical results 
(Section 5.1.6 and our technical report [59]) indicate a reduction in the requisite number of mea- 
surements via this enhancement. If K\ = K2 and Mi = M2, then without loss of generality we set 
71 = 72 = 1 and numerically search for the best parameter 7c. We discuss the asymmetric case 
with K\ = K2 and Mi 7^ M2 in the technical report [59]. 



5.1.4 Converse bound on performance of 7-weighted ^i-norm minimization 

We now provide a converse bound that describes what measurement rate pairs cannot be achieved 
via the 7-weighted ^i-norm minimization. Our notion of a converse focuses on the setting where 
each signal xj is measured via multiplication by the Mj by N matrix $j and joint recovery is 
performed via our 7-weighted ^i-norm formulation (12). Within this setting, a converse region is a 
set of measurement rates for which the recovery fails with overwhelming probability as N increases. 

We assume that J = 2 sources have innovation sparsity rates that satisfy Si = S2 = Sj. 
Our first result, proved in Appendix F, provides deterministic necessary conditions to recover the 
components zq, zi, and Z2, using the 7-weighted £i-norm formulation (12). We note that the lemma 
holds for all such combinations of components that generate the same signals xi = zc + zi and 

X2 = ZC + Z2. 



Lemma 1 Consider any 7c, 71, and 72 in the ^-weighted £i-norm formulation (12). The com- 
ponents Zc, Zl, and Z2 can he recovered using measurement matrices <I>i and $2 only if (i) zi can 
he recovered via ii-norm minimization (3) using $1 and measurements ^izi; (ii) Z2 can he recov- 
ered via ii-norm minimization using $2 CLnd measurements $2^2; o-nd (Hi) zc can he recovered via 
£i-norm minimization using the joint matrix [^f ^2]^ "^^^ measurements [^f $^]-^2;c. 



Lemma 1 can be interpreted as follows. If Mi and M2 are not large enough individually, then 
the innovation components zi and Z2 cannot be recovered. This implies a converse bound on the 
individual measurement rates Ri and i?2- Similarly, combining Lemma 1 with the converse bound 
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Figure 2: Recovering a signal ensemble with sparse common -h innovations (JSM-1 ). We chose a 
common component sparsity rate Sc = 0.2 and innovation sparsity rates Sj = Si = S2 = 0.05. Our 
simulation results use the 'j-weighted ii-norm formulation (12) on signals of length N = 1000; the 
measurement rate pairs that achieved perfect recovery over 100 simulations are denoted by circles. 



of Theorem 2 for single-source £i-norm minimization of the common component zc implies a lower 
bound on the sum measurement rate Ri + i?2- 

Anticipated converse: As shown in Corollary 1, for indices n such that xi{n) and X2{n) differ 
and are nonzero, each sensor must take measurements to account for one of the two coefficients. In 
the case where Si = S2 = Sj, the joint sparsity rate is Sc + ^Sj — ScSj. We define the measurement 
function c'{S) := S ■ c{S) based on Donoho and Tanner's oversampling factor c{S) (Theorem 2). 
It can be shown that the function c'(-) is concave; in order to minimize the sum rate bound, we 
"explain" as many of the sparse coefficients in one of the signals and as few as possible in the other. 
From Corollary 1, we have Ri,R2 > Sj + ScSi — ScSj. Consequently, one of the signals must 
"explain" this sparsity rate, whereas the other signal must explain the rest: 

[Sc + 2Si — ScSj] — [Si + ScSj — ScS]] = Sc + Sj — ScSj. 

Unfortunately, the derivation of c'{S) relies on Gaussianity of the measurement matrix, whereas 
in our case $ has a block matrix form. Therefore, the following conjecture remains to be proved 
rigorously. 

Conjecture 1 Let J = 2 and fix the sparsity rate of the common component Sc and the innovation 
sparsity rates Si = S2 = Sj. Then the following conditions on the measurement rates are necessary 
to enable recovery with probability one: 

Rj > c' {Si + ScSi - ScS]) , j = 1, 2, 
R1 + R2 > c{Si + ScSi-ScS])+c'{Sc + Si-ScSi). 



5.1.5 Achievable bound on performance of £i-norm minimization 

We have not yet characterized the performance of 7-weighted .^i-norm formulation (12) analytically. 
Instead, Theorem 6 below uses an alternative £i-norm based recovery technique. The proof describes 
a constructive recovery algorithm. We construct measurement matrices $1 and ^2, each consisting 
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of two parts. The first parts of the matrices are identical and recover xi — X2- The second parts of 
the matrices are different and enable the recovery of ^xi + ^X2- Once these two components have 
been recovered, the computation of xi and X2 is straightforward. The measurement rate can be 
computed by considering both identical and different parts of the measurement matrices. 

Theorem 6 Let J = 2, N ^ oo and fix the sparsity rate of the common component Sq and the 
innovation sparsity rates Si = S2 = Sj. If the measurement rates satisfy the following conditions: 

Rj > c'{2Si-S]), j = l,2, (13a) 
R1 + R2 > c'{2Si-S]) + c\Sc + 2Si-2ScSi-S] + ScS]), (13b) 

then we can design measurement matrices $1 and $2 with random Gaussian entries and an ii-norm 
minimization recovery algorithm that succeeds with probability approaching one as N increases. 
Furthermore, as Sj ^ the sum measurement rate approaches c'{Sc)- 

The theorem is proved in Appendix G. The recovery algorithm of Theorem 6 is based on linear 

programming. It can be extended from J = 2 to an arbitrary number of signals by recovering all 
signal differences of the form xj^ —xj^ in the first stage of the algorithm and then recovering j Ylj Xj 
in the second stage. In contrast, our 7-wcighted ^i-norm formulation (12) recovers a length-JA?" 
signal. Our simulation experiments (Section 5.1.6) indicate that the 7- weighted formulation can 
recover using fewer measurements than the approach of Theorem 6. 

The achievable measurement rate region of Theorem 6 is loose with respect to the region of the 
anticipated converse Conjecture 1 (see Figure 2). We leave for future work the characterization of a 
tight measurement rate region for computationally tractable (polynomial time) recovery techniques. 

5.1.6 Simulations for JSM-1 

We now present simulation results for several different JSM-1 settings. The 7-weighted £i-norm 
formulation (12) was used throughout, where the optimal choice of 7c, 71, and 72 depends on the 
relative sparsities Kc, Ki, and K2. The optimal values have not been determined analytically. In- 
stead, we rely on a numerical optimization, which is computationally intense. A detailed discussion 
of our intuition behind the choice of 7 appears in the technical report [59]. 

Recovering two signals with symmetric measurement rates: Our simulation setting is 
as follows. The signal components zc, zi, and Z2 are assumed (without loss of generality) to be 
sparse in * = /jv with sparsities Kc, Ki, and K2, respectively. We assign random Gaussian values 

to the nonzero coefficients. We restrict our attention to the symmetric setting in which Ki = K2 
and Ml = M2, and consider signals of length A'" = 50 where Kq + Ki + K2 = 15. 

In our joint decoding simulations, we consider values of Mi and M2 in the range between 10 
and 40. We find the optimal 7c in the 7-weighted ^i-norm formulation (12) using a line search 
optimization, where simulation indicates the "goodness" of specific 7c values in terms of the like- 
lihood of recovery. With the optimal 7c, for each set of values we run several thousand trials to 
determine the empirical probability of success in decoding zi and Z2- The results of the simulation 
are summarized in Figure 3. The savings in the number of measurements M can be substantial, 
especially when the common component Kc is large (Figure 3). For Kc = 11, Ki = K2 = 2, 
M is reduced by approximately 30%. For smaller Kc, joint decoding barely outperforms separate 
decoding, since most of the measurements are expended on innovation components. Additional 
results appear in [59]. 

Recovering two signals with asymmetric measurement rates: In Figure 2, we compare 
separate CS recovery with the anticipated converse bound of Conjecture 1, the achievable bound 
of Theorem 6, and numerical results. 
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Figure 3: Comparison of joint decoding and separate decoding for JSM-1. The advantage of joint 
over separate decoding depends on the common component sparsity. 
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Figure 4: Multi-sensor measurement results for JSM-1. We choose a common component sparsity 
rate Sc = 0.2, innovation sparsity rates Sj = 0.05, and signals of length N = 500; our results 
demonstrate a reduction in the measurement rate per sensor as the number of sensors J increases. 



We use J = 2 signals and choose a common component sparsity rate Sc = 0.2 and innovation 
sparsity rates Si = Si = S2 = 0.05. We consider several different asymmetric measurement rates. 
In each such setting, we constrain M2 to have the form M2 = aMi for some a, with = 1000. 
The results plotted indicate the smallest pairs (Mi,M2) for which we always succeeded recovering 
the signal over 100 simulation runs. In some areas of the measurement rate region our 7-weighted 
^i-norm formulation (12) requires fewer measurements than the achievable approach of Theorem 6. 

Recovering multiple signals with symmetric measurement rates: The 7-weighted £1- 
norm recovery technique of this section is especially promising when J > 2 sensors are used. These 
savings may be valuable in applications such as sensor networks, where data may contain strong 
spatial (inter-source) correlations. 

We use J G {1, 2, ... , 10} signals and choose the same sparsity rates Sc = 0.2 and Sj = 0.05 
as the asymmetric rate simulations; here we use symmetric measurement rates and let A'^ = 500. 
The results of Figure 4 describe the smallest symmetric measurement rates for which we always 
succeeded recovering the signal over 100 simulation runs. As J increases, lower measurement rates 
can be used; the results compare favorably with the lower bound from Conjecture 1, which gives 
Rj « 0.232 as J ^ 00. 
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5.2 Recovery strategies for common sparse supports (JSM-2) 

Under the JSM-2 signal ensemble model from Section 3.3.2, separate recovery of each signal via 
£o-iiorm minimization would require K + 1 measurements per signal, while separate recovery via 
£i-norm minimization would require cK measurements per signal. When Theorems 4 and 5 are 
applied in the context of JSM-2, the boimds for joint recovery match those of individual recovery 
using ^Q-norm minimization. Within this context, it is also possible to recover one of the signals 
using K + 1 measurements from the corresponding sensor, and then with the prior knowledge of 
the support set CI, recover all other signals from K measurements per sensor; thus providing an 
additional savings of J — 1 measurements [60]. Surprisingly, we will demonstrate below that for 
large J, the common support set can actually be recovered using only one measurement per sensor 
and algorithms that are computationally tractable. 

The algorithms we propose are inspired by conventional greedy pursuit algorithms for CS 
(such as OMP [26]). In the single-signal case, OMP iteratively constructs the sparse support set $7; 

decisions are based on inner products between the columns of $ and a residual. In the multi-signal 
case, there are more clues available for determining the elements of fi. 

5.2.1 Recovery via Trivial Pursuit (TP) 

When there are many correlated signals in the ensemble, a simple non-iterative greedy algorithm 
based on inner products will suffice to recover the signals jointly. For simplicity but without loss 
of generality, we assume that an equal number of measurements Mj = M are taken of each signal. 
We write $j in terms of its columns, with $j = [4>j,i, 4>j^2i ■ ■ ■ i 4'j,N]- 

Trivial Pursuit (TP) Algorithm for JSM-2 
1. Get greedy: Given all of the measurements, compute the test statistics 

1 

= J J2(yj^ 't>j,nf, n G {1, 2, . . . , TV}, (14) 

and estimate the elements of the common coefficient support set by 

n = {n having one of the K largest ^n}- 

When the sparse, nonzero coefficients are sufficiently generic (as defined below), we have the 
following surprising result, which is proved in Appendix H. 

Theorem 7 Let ^' be an orthonormal basis for M.^ , let the measurement matrices contain i.i.d. 
Gaussian entries, and assume that the nonzero coefficients in the Oj are i.i.d. Gaussian random 
variables. Then with M > 1 measurements per signal, TP recovers J7 with probability approaching 
one as J ^ oo. 

In words, with fewer than K measurements per sensor, it is actually possible to recover the 
sparse support set Q. under the JSM-2 model. ^ Of course, this approach does not recover the K 
coefficient values for each signal; at least K measurements per sensor are required for this. 

Corollary 2 Assume that the nonzero coefficients in the 9j are i.i.d. Gaussian random variables. 
Then the following statements hold: 

^One can also show the somewhat stronger result that, as long as Mj » N, TP recovers Q with probability 
approaching one. We have omitted this additional result for brevity. 
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1. Let the measurement matrices contain i.i.d. Gaussian entries, with each matrix having an 
overmeasuring factor of c = 1 (that is, Mj = K for each measurement matrix ^j). Then TP 
recovers all signals from the ensemble {xj} with probability approaching one as J ^ oo. 

2. Let $j he a measurement matrix with overmeasuring factor c < 1 (that is, Mj < K), for 
some j E A. Then with probability one, the signal xj cannot be uniquely recovered by any 
algorithm for any value of J. 

The first statement is an immediate corollary of Theorem 7; the second statement follows 
because each equation yj = ^jXj would be underdetermined even if the nonzero indices were 
known. Thus, under the JSM-2 model, the TP algorithm asymptotically performs as well as 
an oracle decoder that has prior knowledge of the locations of the sparse coefficients. From an 
information theoretic perspective, Corollary 2 provides tight achievable and converse bounds for 
JSM-2 signals. We should note that the theorems in this section have a slightly different flavor 
than Theorem 4 and 5, which ensure recovery of any sparse signal ensemble, given a sTiitablc set 
of measurement matrices. Theorem 7 and Corollary 2 above, in contrast, rely on a random signal 
model and do not guarantee simultaneous performance for all sparse signals under any particTilar 
measurement ensemble. Nonetheless, we feel this result is worth presenting to highlight the strong 
subspace concentration behavior that enables the correct identification of the common support. 

In the technical reports [59, 61], we derive an approximate formula for the probability of error 
in recovering the common support set 0, given J, K, M, and N. While theoretically interesting and 
potentially practically useful, these results require J to be large. Our numerical experiments show 
that the number of measurements required for recovery using TP decreases quickly as J increases. 
However, in the case of small J, TP performs poorly. Hence, wc propose next an alternative 
recovery technique based on simultaneous greedy pursuit that performs well for small J. 

5.2.2 Recovery via iterative greedy pursuit 

In practice, the common sparse support among the J signals enables a fast iterative algorithm 
to recover all of the signals jointly. Tropp and Gilbert have proposed one such algorithm, called 
Simultaneous Orthogonal Matching Pursuit (SOMP) [53], which can be readily applied in our DCS 
framework. SOMP is a variant of OMP that seeks to identify Q, one element at a time. A similar 
simultaneous sparse approximation algorithm has been proposed using convex optimization [62]. 
We dub the DCS-tailored SOMP algorithm DCS-SOMP. 

To adapt the original SOMP algorithm to our setting, we first extend it to cover a different 
measurement matrix for each signal xj. Then, in each DCS-SOMP iteration, we select the 
column index n G {1,2,... ,A^} that accounts for the greatest amount of residual energy across 
all signals. As in SOMP, we orthogonalize the remaining columns (in each measurement matrix) 
after each step; after convergence we obtain an expansion of the measurement vector yj on an 
orthogonalized subset of the columns of basis vectors. To obtain the expansion coefficients in the 
sparse basis, we then reverse the orthogonalization process using the QR matrix factorization. 
Finally, we again assume that Mj = M measurements per signal are taken. 



DCS-SOMP Algorithm for JSM-2 

1. Initialize: Set the iteration counter £ = 1. For each signal index j € A, initialize the 
orthogonalized coefficient vectors (3j = 0, f3j G M'^; also initialize the set of selected indices 
i7 = 0. Let rj^f, denote the residual of the measurement yj remaining after the first £ iterations, 
and initialize r^^o = Uj- 
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2. Select the dictionary vector that maximizes the value of the sum of the magnitudes of the 
projections of the residual, and add its index to the set of selected indices 

E'^ \{'rj,i-lAj,n)\ 
-TT, 77^ ! 
.^^ Il0i,n||2 

J7 = [f2 n(\. 

3. Orthogonalize the selected basis vector against the orthogonalized set of previously selected 
dictionary vectors 

4. Iterate: Update the estimate of the coefficients for the selected vector and residuals 

(^i,£-i,7i/) 
\nj,e\\2 

5. Check for convergence: If ||rj£||2 > e||yj||2 for all j, then increment £ and go to Step 
2; otherwise, continue to Step 6. The parameter e determines the target error power level 
allowed for algorithm convergence. 

6. De-orthogonalize: Consider the relationship between Tj = [7^,1,7^,25 • • • )7i,M] and the $j 
given by the QR factorization ^ ^ = ^jRji where ^ q = [4'j,ni, 4'j,n2J • • • > 4'j,nM] ^he so- 
called mutilated basis.^ Since yj = Tj(3j = ^jqX^^ = TjRjX.^, where x.^ is the mutilated 
coefficient vector, we can compute the signal estimates {xj} as 

where x^ ^ is the mutilated version of the sparse coefficient vector Xj . 

In practice, we obtain cK measurements from each signal xj for some value of c. We then use 
DCS-SOMP to recover the J signals jointly. Wc orthogonalize because as the number of iterations 
approaches M the norms of the residues of an orthogonal pursuit decrease faster than for a non- 
orthogonal pursuit; indeed, due to Step 3 the algorithm can only run for up to M iterations. The 
computational complexity of this algorithm is 0{JNM'^), which matches that of separate recovery 
for each signal while reducing the required number of measurements. 

Thanks to the common sparsity structure among the signals, we believe (but have not proved) 
that DCS-SOMP will succeed with c < c{S). Empirically, we have observed that a small number 
of measurements proportional to K suffices for a moderate number of sensors J. Based on our 
observations, described in Section 5.2.3, we conjecture that K + 1 measurements per sensor suffice 
as J ^ 00. Thus, this efficient greedy algorithm enables an overmeasuring factor c = {K + 1)/K 
that approaches 1 as J, K, and N increase. 

®We define a mutilated basis $0 as a subset of the basis vectors fi-om $ = [0i, 02, ■ ■ ■ , 0^] corresponding to the 
indices given by the set Q = {ni, 712, . . . , um}, that is, $n = [0ni , 07121 • • • i ^jim]- This concept can be extended to 
vectors in the same manner. 
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Figure 5: Recovering a signal ensemble with common sparse supports (JSM-2). We plot the probability of 
perfect recovery via DCS-SOMP (solid lines) and separate CS recovery (dashed lines) as a function of the 
number of measurements per signal M and the number of signals J. We fix the signal length to N = 50, 
the sparsity to K — 5, and average over 1000 simulation runs. An oracle encoder that knows the positions 
of the large signal expansion coefficients would use 5 measurements per signal. 

5.2.3 Simulations for JSM-2 

We now present simulations comparing separate CS recovery versus joint DCS-SOMP recovery for 
a JSM-2 signal ensemble. Figure 5 plots the probability of perfect recovery corresponding to various 
numbers of measurements M as the number of sensors varies from J = 1 to 32, over 1000 trials in 
each case. We fix the signal lengths at = 50 and the sparsity of each signal to K = 5. 

With DCS-SOMP, for perfect recovery of all signals the average number of measurements per 
signal decreases as a function of J. The trend suggests that for large J close to K measurements per 
signal should suffice. On the contrary, with separate CS recovery, for perfect recovery of all signals 
the number of measurements per sensor increases as a function of J. This occurs because each 
signal experiences an independent probability p < 1 of successful recovery; therefore the overall 
probability of complete success is p"^ . Consequently, each sensor must compensate by making 
additional measurements. This phenomenon further motivates joint recovery under JSM-2. 

Finally, we note that we can use algorithms other than DCS-SOMP to recover the signals under 
the JSM-2 model. Cotter et al. [55] have proposed additional algorithms (such as M-FOCUSS) that 
iteratively eliminate basis vectors from the dictionary and converge to the set of sparse basis vectors 
over which the signals are supported. We hope to extend such algorithms to JSM-2 in future work. 

5.3 Recovery strategies for nonsparse common component 
-|- sparse innovations (JSM-3) 

The JSM-3 signal ensemble model from Section 3.3.3 provides a particularly compelling motivation 
for joint recovery. Under this model, no individual signal xj is sparse, and so recovery of each 
signal separately would require fully measurements per signal. As in the other JSMs, however, 
the commonality among the signals makes it possible to substantially reduce this number. Again, 
the potential for this savings is evidenced by specializing Theorem 4 to the context of JSM-3. 
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Corollary 3 // $j is a random Gaussian matrix for all j & A, V is defined by JSM-3, and 

J^Mj > J2^^ + ^c{r,P) + n re A, (15) 

jer jer 

> ^Kj+N + J-Kr, (16) 

ieA jeA 

then the signal ensemble X can be uniquely recovered from Y with probability one. 

This suggests that the iiTimbcr of measurements of an individual signal can be substantially 
decreased, as long as the total number of measurements is sufficiently large to capture enough 
information about the nonsparse common component zc- The term Kr denotes the number of 
indices where the common and all innovation components overlap, and appears due to the sparsity 
reduction that can be performed at the common component before recovery. We also note that 
when the supports of the innovations are independent, as J ^ oo, it becomes increasingly unlikely 
that a given index will be included in all innovations, and thus the terms Kc{T, P) and Kr will go 
to zero. On the other hand, when the supports are completely matched (implying Kj = K, j £ A), 
we will have Kr = K, and after sparsity reduction has been addressed, Kc{T, P) = for all F C A. 

5.3.1 Recovery via Transpose Estimation of Common Component (TECC) 

Successful recovery of the signal ensemble {xj} requires recovery of both the nonsparse common 
component zc and the sparse innovations {zj}. To help build intuition about how we might 
accomplish signal recovery using far fewer than N measurements per sensor, consider the following 
thought experiment. 

If zc were known, then each innovation zj could be estimated using the standard single-signal 
CS machinery on the adjusted measurements yj — ^jzc = ^jZj. While zc is not known in advance, 
it can be estimated from the measurements. In fact, across all J sensors, a total of J2j£A random 
projections of zc are observed (each corrupted by a contribution from one of the Zj). Since zc is 
not sparse, it cannot be recovered via CS techniques, but when the number of measurements is 
sufficiently large {YljeA-^j ^ can be estimated using standard tools from linear algebra. 

A key requirement for such a method to succeed in recovering zc is that each $j be different, so 
that their rows combine to span all of M^. In the limit (again, assuming the sparse innovation 
coefficients are well-behaved), the common component zc can be recovered while still allowing each 
sensor to operate at the minimum measurement rate dictated by the {zj}. A prototype algorithm 
is listed below, where we assume that each measurement matrix $j has i.i.d. J\f{0,a^) entries. 

TECC Algorithm for JSM-3 

1. Estimate common component: Define the matrix $ as the vertical concatenation of the 
regularized individual measurement matrices $j = j^-^^j, that is, $ = [$^,$2^, . . . ,^^]^- 

Calculate the estimate of the common component as % = ^^^Y . 

2. Estimate measurements generated by innovations: Using the previous estimate, sub- 
tract the contribution of the common part from the measurements and generate estimates for 
the measurements caused by the innovations for each signal: yj = yj — ^jZc- 

3. Recover innovations: Using a standard single-signal CS recovery algorithm,^ obtain esti- 
mates of the innovations from the estimated innovation measurements yj. 

^For tractable analysis of the TECC algorithm, the proof of Theorem 8 employs a least-squares variant of £o-norm 
minimization. 
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4. Obtain signal estimates: Estimate each signal as the sum of the common and innovations 

estimates; that is, xj = zc + Zj. 

The following theorem, proved in Appendix I, shows that asymptotically, by using the TECC 
algorithm, each sensor needs to only measure at the rate dictated by the sparsity Kj. 

Theorem 8 Assume that the nonzero expansion coefficients of the sparse innovations zj are i.i.d. 
Gaussian random variables and that their locations are uniformly distributed on {1, 2, . . . ,N}. Let 
the measurement matrices <l>j contain i.i.d. J\f{0,crj) entries with Mj > Kj + 1. Then each signal 
Xj can be recovered using the TECC algorithm with probability approaching one as J ^ oo. 

For large J, the measurement rates permitted by Theorem 8 are the best possible for any 
recovery strategy for JSM-3 signals, even neglecting the presence of the nonsparse component. 
These rates meet the minimum bounds suggested by Corollary 3, although again Theorem 8 is of 
a slightly different flavor, as it does not provide a uniform guarantee for all sparse signal ensembles 
under any particular measurement matrix collection. The CS technique employed in Theorem 8 
involves combinatorial searches that estimate the innovation components; we have provided the 
theorem simply as support for our intuitive development of the TECC algorithm. More efficient 
techniques could also be employed (including several proposed for CS in the presence of noise [25, 63, 
64]). It is reasonable to expect similar behavior; as the error in estimating the common component 
diminishes, these techniques should perform similarly to their noiseless analogues. 

5.3.2 Recovery via Alternating Common and Innovation Estimation (ACIE) 

The preceding analysis demonstrates that the number of required measurements in JSM-3 can be 
substantially reduced through joint recovery. While Theorem 8 shows theoretical gains as J — ^ oo, 
practical gains can also be realized with a moderate number of sensors. In particular, suppose 
in the TECC algorithm that the initial estimate zc is not accurate enough to enable correct 
identification of the sparse innovation supports {^j}- In such a case, it may still be possible for 
a rough approximation of the innovations {zj} to help refine the estimate 'zc- This in turn could 
help to refine the estimates of the innovations. 

The Alternating Common and Innovation Estimation (ACIE) algorithm exploits the observa- 
tion that once the basis vectors comprising the innovation Zj have been identified in the index set 
Clj, their effect on the measurements yj can be removed to aid in estimating zc- Suppose that we 
have an estimate for these innovation basis vectors in . We can then partition the measurements 
into two parts: the projection into span({^j_„}^g^ ) and the component orthogonal to that span. 

We build a basis for the where yj lives: 

= [^j,n, Qj], 

where ^ . is the mutilated matrix corresponding to the indices in Qj, and the Mj x (Mj — \Qj\) 
matrix Qj has orthonormal columns that span the orthogonal complement of ^jQ - 

This construction allows us to remove the projection of the measurements into the aforemen- 
tioned span to obtain measurements caused exclusively by vectors not in $7^ : 

yj = Qjyj and $j = Qj<^j. (17) 

These modifications enable the sparse decomposition of the measurement, which now lives in 
^Mj-\fij\^ to remain unchanged: 
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N 
n=l 

Thus, the modified measurements Y = [y[ . . . yj] ^ and modified measurement matrix $ = 
$f ^2 ■ ■ ■ can be used to refine the estimate of the common component of the signal, 

^ = $ty, (18) 

where = {A'^A)~^A'^ denotes the pseudoinverse of matrix A. 

When the innovation support estimate is correct {Qj = ^j), the measurements yj will describe 
only the common component zc- If this is true for every signal j and the number of remaining 
measurements ^j=i{Mj — Kj) > N, then zc can be perfectly recovered via (18). However, it may 
be difficult to obtain correct estimates for all signal supports in the first iteration of the algorithm, 
and so we find it preferable to refine the estimate of the support by executing several iterations. 

ACIE Algorithm for JSM-3 

1. Initialize: Set J7j = for each j. Set the iteration counter £ = 1. 

2. Estimate common component: Update estimate zc according to (17)-(18). 

3. Estimate innovation supports: For each sensor j, after subtracting the contribution zc 
from the measurements, yj = yj — ^jZc, estimate the support of each signal innovation ^Ij. 

4. Iterate: If ^ < L, a preset number of iterations, then increment £ and return to Step 2. 
Otherwise proceed to Step 5. 

5. Estimate innovation coefficients: For each j, estimate the coefficients for the indices in 

where z. . is a mutilated version of the innovation's sparse coefficient vector estimate Zj . 

6. Recover signals: Compute the estimate of each signal as Xj = ^ + Zj. 

Estimation of the supports in Step 3 can be accomplished using a variety of techniques. We 
propose to run a fixed number of iterations of OMP; if the supports of the innovations are known 
to match across signals — as in JSM-2 — then more powerful algorithms like SOMP can be used. 
The ACIE algorithm is similar in spirit to other iterative estimation algorithms, such as turbo 
decoding [65]. 



5.3.3 Simulations for JSM-3 

We now present simulations of JSM-3 recovery for the following scenario. Consider J signals of 
length N = 50 containing a common white noise component zc{n) ~ J\f{0, 1) for n G {1, . . . ,N}. 
Each innovations component Zj has sparsity K = 5 (once again in the time domain), resulting in 
Xj = zc + Zj. The signals are generated according to the model used in Section 5.1.6. 

We study two different cases. The first is an extension of JSM-1: we select the supports for the 
various innovations separately and then apply OMP to each signal in Step 3 of the ACIE algorithm 
in order to estimate its innovations component. The second case is an extension of JSM-2: we select 
one common support for all of the innovations across the signals and then apply the DCS-SOMP 
algorithm (Section 5.2.2) to estimate the innovations in Step 3. In both cases we use L = 10 
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Figure 6: Recovering a signal ensemble with nonsparse common component and sparse innovations (JSM-3) 
using ACIE. (a) recovery using OMP separately on each signal in Step 3 of the ACIE algorithm (innovations 
have arbitrary supports), (b) recovery using DCS-SOMP jointly on all signals in Step 3 of the ACIE 
algorithm (innovations have identical supports). Signal length N — 50, sparsity K — 5. The common 
structure exploited by DCS-SOMP enables dramatic savings in the number of measurements. We average 
over 1000 simulation runs. 



iterations of ACIE. We test the algorithms for different numbers of signals J and calculate the 
probability of correct recovery as a function of the (same) number of measurements per signal M. 

Figure 6(a) shows that, for sufficiently large J, we can recover all of the signals with significantly 
fewer than measurements per signal. As J grows, it becomes more difficult to perfectly recover 
all J signals. We believe this is inevitable, because even if zc were known without error, then 
perfect ensemble recovery would require the successful execution of J independent runs of OMP. 
Second, for small J, the probability of success can decrease at high values of M. We believe this 
behavior is due to the fact that initial errors in estimating zc may tend to be somewhat sparse 
(since zc roughly becomes an average of the signals {xj}), and these sparse errors can mislead 
the subsequent OMP processes. For more moderate M, it seems that the errors in estimating 
Zc (though greater) tend to be less sparse. We expect that a more sophisticated algorithm could 
alleviate such a problem; the problem is also mitigated at higher J. 

Figure 6(b) shows that when the sparse innovations share common supports we see an even 
greater savings. As a point of reference, a traditional approach to signal acquisition would require 
1600 total measurements to recover these J = 32 nonsparse signals of length N = 50. Our approach 
requires only approximately 10 random measurements per sensor — a total of 320 measurements 
— for high probability of recovery. 

6 Discussion and Conclusions 

In this paper we have extended the theory and practice of compressive sensing to multi-signal, 
distributed settings. The number of noiseless measurements required for ensemble recovery is de- 
termined by the dimensionality of the subspace in the relevant signal model, because dimensionality 
and sparsity play a volumetric role akin to the entropy used to characterize rates in source cod- 
ing. Our three example joint sparsity models (JSMs) for signal ensembles with both intra- and 
inter-signal correlations capture the essence of real physical scenarios, illustrate the basic analysis 
and algorithmic techniques, and indicate the significant gains to be realized from joint recovery. In 
some sense, distributed compressive sensing (DCS) is a framework for distributed compression of 
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sources with memory, which has remained a challenging problem for some time. 

In addition to offering substantially reduced measurement rates, the DCS-based distributed 
source coding schemes we develop here share the properties of CS mentioned in Section 2. Two 
additional properties of DCS make it well-matched to distributed applications such as sensor net- 
works and arrays [51, 52]. First, each sensor encodes its measurements separately, which eliminates 
the need for inter-sensor communication. Second, DCS distributes its computational complexity 
asymmetrically, placing most of it in the joint decoder, which will often have more computational 
resources than any individual sensor node. The encoders are very simple; they merely compute 
incoherent projections with their signals and make no decisions. 

There are many opportunities for applications and extensions of these ideas. First, natural 
signals are not exactly sparse but rather can be better modeled as ^^-compressible with < p < 
1. Roughly speaking, a signal in a weak-£p ball has coefficients that decay as n~^^'P once sorted 
according to magnitude [24]. The key concept is that the ordering of these coefficients is important. 
For JSM-2, we can extend the notion of simultaneous sparsity for £p-sparse signals whose sorted 
coefficients obey roughly the same ordering [66]. This condition could perhaps be enforced as an 
£p constraint on the composite signal 

[j=l j=l 3=1 

Second, (random) measurements are real numbers; quantization gradually degrades the recov- 
ery quality as the quantization becomes coarser [34, 67, 68]. Moreover, in many practical situations 
some amount of measurement noise will corrupt the {xj}, making them not exactly sparse in any 
basis. While characterizing these effects and the resulting rate-distortion consequences in the DCS 
setting are topics for future work, there has been work in the single-signal CS literature that we 
should be able to leverage, including variants of Basis Pursuit with Denoising [63, 69], robust iter- 
ative recovery algorithms [64], CS noise sensitivity analysis [25,34], the Dantzig Selector [33], and 
one-bit CS [70]. 

Third, in some applications, the linear program associated with some DCS decoders (in JSM-1 
and JSM-3) could prove too computationally intense. As we saw in JSM-2, efficient iterative and 
greedy algorithms could come to the rescue, but these need to be extended to the multi-signal 
case. Recent results on recovery from a union of subspaces give promise for efficient, model-based 
algorithms [66]. 

Finally, we focused our theory on models that assign common and innovation components 
to the signals in the ensemble. Other models tailored to specific applications can be posed; for 
example, in hyperspectral imaging applications, it is common to obtain strong correlations only 
across spectral slices within a certain neighborhood. It would be then appropriate to pose a 
common/innovation model with separate common components that are localized to a subset of the 
spectral slices obtained. Results similar to those obtained in Section 4 are simple to derive for 
models with full-rank location matrices. 

A Proof of Theorem 1 

Statement 2 is an application of the achievable bound of Theorem 4 to the case of J = 1 signal. It 
remains then to prove Statements 1 and 3. 
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Statement 1 (Achievable, M > 2K): We first note tliat, if K > N/2, then witli probability 
one, the matrix $ has rank A^, and there is a unique (correct) recovery. Thus we assume that 
K < N/2. With probability one, all subsets of up to 2K columns drawn from $ are linearly 
independent. Assuming this holds, then for two index sets 0, ^ Q such that \Q\ = \Q\ = K, 
colspan(<J>f7) ncolspan(<I>j^) has dimension equal to the number of indices common to both 0, and 
A signal projects to this common space only if its coefficients are nonzero on exactly these (fewer 
than K) common indices; since \\9\\o = K, this does not occur. Thus every if-sparse signal projects 
to a unique point in M^. 

Statement 3 (Converse, M < K): If M < K, there is insufficient information in the vector 
y to recover the K nonzero coefficients of 9; thus we assume M = K. In this case, there is a 
single explanation for the measurements only if there is a single set Jl of K linearly independent 
columns and the nonzero indices of 6 are the elements of Aside from this pathological case, the 
rank of subsets will generally be less than K — which would prevent robust recovery of signals 

supported on Q,, or will be equal to K — which would give ambiguous solutions among all such 
sets ri. □ 



B Proof of Theorem 3 

We let 

D:=Kc + Y,K^ (19) 

denote the number of columns in P. Because P € Vf{N), there exists € such that X = P&. 
Because Y = ^X, is a solution to Y = $P0. Wc will argue that, with probability one over 

T := $P 

has rank D, and thus 9 is the unique solution to the equation Y = $P0 = T@. 

Wc recall that, under our common/innovation model, P has the form (5), where Pc is an 
N X Kc submatrix of the N x N identity, and each Pj, j G A, is an x Kj submatrix of the 
N X N identity. To prove that T has rank D, we will require the following lemma, which we prove 
in Appendix C. 

Lemma 2 If (7) holds, then there exists a mapping C : {1, 2, . . . , Kc} A, assigning each element 
of the common component to one of the sensors, such that for each F C A, 

Kc 

lc(fe)er (20) 

jer j& k=i 
and such that for each k & {1,2, ... , Kc}, the k^^^ column of Pc is not a column of -Pc(fe)- 

Intuitively, the existence of such a mapping suggests that {i) each sensor has taken enough 
measurements to cover its own innovation (requiring Kj measurements) and perhaps some of the 
common component, (ii) for any F C A, the sensors in F have collectively taken enough extra 
measurements to cover the requisite Kc(T,P) elements of the common component, and (Hi) the 
extra measurements are taken at sensors where the common and innovation components do not 
overlap. Formally, we will use the existence of such a mapping to prove that T has rank D. 
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We proceed by noting that T has the form 

... " 

$2-P2 ■•• 

... <^jPj _ 

where each ^jPc (respectively, ^jPj) is an Mj x Kc (respectively, Mj x Kj) submatrix of $j 
obtained by selecting columns from according to the nonzero entries of Pq (respectively, Pj). 
In total, T has D columns (19). To argue that T has rank D, we will consider a sequence of three 
matrices Tq, Ti, and T2 constructed from small modifications to T. 

We begin by letting Tq denote the "partially zeroed" matrix obtained from T using the following 
construction. We first let Tq = T and then make the following adjustments: 



^2PC 



1. Let k = l. 

2. For each j such that Pj has a column that matches column k of Pc (note that by Lemma 2 
this cannot happen if C{k) = j), let k' represent the column index of the full matrix P where 
this column of Pj occurs. Subtract column k' of Tq from column k of Tq. This forces to zero 
all entries of Tq formerly corresponding to column k of the block ^jPc- 

3. If A; < Kc, add one to k and go to step 2. 



The matrix Tq is identical to T everywhere except on the first Kc columns, where any portion of a 
column overlapping with a column of ^jPj to its right has been set to zero. Thus, Tq satisfies the 
next two properties, which will be inherited by matrices Ti and T2 that we subsequently define: 

PI. Each entry of Tq is either zero or a Gaussian random variable. 
P2. All Gaussian random variables in Tq are i.i.d. 

Finally, because Tq was constructed only by subtracting columns of T from one another, 

rank(To) = rank(T). (21) 



We now let Ti be the matrix obtained from Tq using the following construction. For each 
j G A, we select Kj + X^j^^ lc(^k)=j arbitrary rows from the portion of Tq corresponding to sensor 
j. Using (19), the resulting matrix Ti has 

E (^^ + E lc(fc)=,) =Y.K, + Kc = D 
jeA \ k=i / jeA 

rows. Also, because Ti was obtained by selecting a subset of rows from Tq, it has D columns and 
satisfies 

rank(Ti) < rank(To). (22) 

We now let T2 be the D x D matrix obtained by permuting columns of Ti using the following 
construction: 



1. Let T2 = [ ], and let j = 1. 
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2. For each k such that C{k) = j, let Ti{k) denote the fc*'^ column of Ti, and concatenate Ti(A;) 
to T2, i.e., let T2 ^ [T2 Ti(fc)]. There arc "^^^i lc{k)=j such columns. 

3. Let T'j denote the columns of Ti corresponding to the entries of ^jPj (the innovation com- 
ponents of sensor j), and concatenate T'^ to T2, i.e., let T2 <— [T2 T'^]. There are Kj such 
columns. 

4. If j < J, let j ^ j + 1 and go to Step 2. 

Because Ti and T2 share the same columns up to reordering, it follows that 

rank(T2) = rank(Ti). (23) 

Based on its dependency on Tq, and following from Lemma 2, the square matrix T2 meets properties 
PI and P2 defined above in addition to a third property: 

P3. All diagonal entries of T2 are Gaussian random variables. 

This follows because for each j, Kj + J2k=i ^C{k)=j rows of Ti are assigned in its construction, while 
Kj + Ylk=i ^C{k)=j columns of T2 are assigned in its construction. Thus, each diagonal element of 
T2 will either be an entry of some ^jPj, which remains Gaussian throughout our constructions, or 
it will be an entry of some k^^ column of some ^jPc for which C{k) = j. In the latter case, we 
know by Lemma 2 and the construction of Tq that this entry remains Gaussian throughout our 
constructions. 

Having identified these three properties satisfied by T2, we will prove by induction that, with 
probability one over such a matrix has full rank. 

Lemma 3 Let T^'^~^^ be a {d — 1) x {d— 1) matrix having full rank. Construct a dx d matrix T^^^ 
as follows: 

ijj 



where v\.,V2 £ M are vectors with each entry being either zero or a Gaussian random variable, 
uj is a Gaussian random variable, and all random variables are i.i.d. and independent of T^'^~^\ 
Then with probability one, T^^^ has full rank. 

Applying Lemma 3 inductively D times, the success probability remains one. It follows that 
with probability one over rank(T2) = D. Combining this last result with (21-23), we obtain 
rank(T) = D with probability one over $. It remains to prove Lemma 3. 

Proof of Lemma 3: When d = 1, T^^^ = [co], which has full rank if and only if a; 7^ 0, which 

occurs with probability one. 

When d > 1, using expansion by minors, the determinant of T^*^) satisfies 

det(T('^)) = UJ ■ det(T('^-^)) + C, 

where C = C{T^'^~^\vi,V2) is independent of cj. The matrix T^*^) has full rank if and only if 
det(T('^)) 7^ 0, which is satisfied if and only if 

u 7^ 



det(T('^-i))' 



By assumption, det(T('^ ^)) 7^ and a; is a Gaussian random variable that is independent of C and 
det(T('^~^)). Thus, uj 7^ det(Y(^-i)) ^^^^ probability one. □ 
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C Proof of Lemma 2 



To prove this lemma, we apply tools from graph theory. 

We seek a matching within the graph G = {Vv,Vm,E) from Figure 1, i.e., a subgraph 

{Vy, Vm, E) with E Q E that pairs each clement of Vy with a unique element of Vm- Such a match- 
ing will immediately give us the desired mapping C as follows: for each k G {1,2, .. . ,Kc} C Vy, 
we let {j, m) G Vm denote the single node matched to k by an edge in £', and we set C{k,) = j. 

To prove the existence of such a matching within the graph, we invoke a version of Hall's 

marriage theorem for bipartite graphs [71]. Hall's theorem states that within a bipartite graph 
(Vi, V2,E), there exists a matching that assigns each element of Vi to a unique element of V2 if for 
any collection of elements H C Vi, the set E{Il) of neighbors of H in V2 has cardinality |£'(n)| > |nj. 

In the context of our lemma, Hall's condition requires that for any set of entries in the value 
vector, H C Vy, the set ^(H) of neighbors of H in Vm has size |£^(n)| > |n|. We will prove that if 
(7) is satisfied, then Hall's condition is satisfied, and thus a matching must exist. 

Let us consider an arbitrary set H C Vy. We let E{Il) denote the set of neighbors of H in Vm 
joined by edges in E, and we let 5'n = { j G A : {j, m) G Eili) for some m}. Thus, -Sn C A denotes 
the set of signal indices whose measurement nodes have edges that connect to H. It follows that 
|£^(n)| = YljeSn ^j- Thus, in order to satisfy Hall's condition for H, we require 



We would now hke to show that "^^jjeSn + E^c{Sn,P) > |H|, and thus if (7) is satisfied for all 
r C A, then (24) is satisfied in particular for C A. 

In general, the set H may contain vertices for both common components and innovation com- 
ponents. We write H = H/ U He to denote the disjoint union of these two sets. 

By construction, |H/| = J2jeSn because we count all innovations with neighbors in 5n, and 
because Su contains all neighbors for nodes in H/. We will also argue that KciSji, P) > [Hcl as 
follows. By definition, for a set F C A, Kc{T,P) counts the number of columns in Pc that also 
appear in Pj for all j ^ F. By construction, for each k G He, node k has no connection to nodes 
{j,m) for j ^ Su', thus it must follow that the k^^ column of Pq is present in Pj for all j ^ Su, due 
to the construction of the graph G. Consequently, Kc{Sn,P) > |nc|- 

Thus, J2jeSn ^'j ~^ ^c{Su,P) > jH/l + jHcl = |n|, and so (7) implies (24) for any H, and so 
Hall's condition is satisfied, and a matching exists. Because in such a matching a set of vertices in 
Vm matches to a set in Vy of lower or equal cardinality, we have in particular that (20) holds for 
each F C A. □ 

D Proof of Theorem 4 

Given the measurements Y and measurement matrix we will show that it is possible to recover 
some P G Vf{X) and a corresponding vector Q such that X = P@ using the following algorithm: 

• Take the last measurement of each sensor for verification, and sum these J measurements to 
obtain a single global test measurement y. Similarly, add the corresponding rows of $ into a 
single row ^. 

• Group all the remaining X^^g^ Mj — J measurements into a vector Y and a matrix $. 




(24) 
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• For each matrix P gV: 

— choose a single solution Op to 1^ = #PBp independently of ^ — if no solution exists, 
skip the next two steps; 

— define Xp = PQp; 

— cross- validate: check if y = (pXp; if so, return the estimate (P, Qp); if not, continue with 
the next matrix. 

We begin by showing that, with probability one over the algorithm only terminates when it 
gets a correct solution — in other words, that for each P eV the cross-validation measurement y 
can determine whether Xp = X. We note that all entries of the vector <f) are i.i.d. Gaussian, and 
independent from ^. Assume for the sake of contradiction that there exists a matrix P € "P such 
that y = (pXp, but Xp = PQp ^ X; this implies (p{X — Xp) = 0, which occurs with probability 
zero over Thus, if Xp ^ X, then ^Xp ^ y with probability one over Since we only need to 
search over a finite number of matrices P G V, cross validation will determine whether each matrix 
P € V gives the correct solution with probability one. 

We now show that there is a matrix in V for which the algorithm will terminate with the correct 
solution. We know that the matrix P* G VpiX) C V will be part of our search, and that the unique 
solution 6p* to y = $P*Gp* yields X = P*Qp* when (8) holds for P*, as shown in Theorem 3. 
Thus, the algorithm will find at least one matrix P and vector Qp such that X = PQp; when such 
matrix is found the cross-validation step will return this solution and end the algorithm. □ 

Remark 2 Consider the algorithm used in the proof: if the matrices in V are sorted by number 

of columns, then the algorithm is akin to d-Q-norm minimization on Q with an additional cross- 
validation step. The iQ-norm minimization algorithm is known to be optimal for recovery of strictly 
sparse signals from noiseless measurements. 

E Proof of Theorem 5 

We let D denote the number of columns in P. Because P € Vf{X), there exists Q € such that 
X = PQ. Because Y = then 6 is a solution to y = $P9. We will argue for T := $P that 
rank(T) < D, and thus there exists Q ^ Q such that Y = T6 = T0. Moreover, since P has full 
rank, it follows that X := PQ ^ PQ = X. 

We let To be the "partially zeroed" matrix obtained from T using the identical procedure 
detailed in Appendix B. Again, because Tq was constructed only by subtracting columns of T 
from one another, it follows that rank(To) = rank(T). 

Suppose r C A is a set for which (9) holds. We let Ti be the submatrix of Tq obtained by 
selecting the following columns: 

• For any k G {1,2,..., Kc} such that column k of Pc also appears as a column in all Pj for 
J ^ r, we include column k of Tq as a column in Ti. There are Kc{T, P) such columns k. 

• For any k G {Kq + 1, Kc + 2, . . . , D} such that column of P corresponds to an innovation 
for some sensor j G F, we include column k of Tq as a column in Ti. There are J2jeT-^j 
such columns k. 

This submatrix has J2j£r + -^c(r, P) columns. Because Tq has the same size as T, and in 
particular has only D columns, then in order to have that rank(To) = D, it is necessary that all 
J2ier + -^c(r, P) columns of Ti be linearly independent. 
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Based on the method described for constructing Tq, it fohows that Ti is zero for all measure- 
ment rows not corresponding to the set T. Therefore, consider the submatrix T2 of Ti obtained 
by selecting only the measurement rows corresponding to the set T. Because of the zeros in Ti, it 
follows that rank(Ti) = rank(T2). However, since T2 has only X^^gp Afj rows, we invoke (9) and 
have that rank(Ti) = rank(T2) < Ejer < Ejer Kj + Kc{r, P). Thus, ah ^-^^ Kj + Kc{r, P) 
columns of Ti cannot be linearly independent, and so T does not have full rank. □ 



F Proof of Lemma 1 

Necessary conditions on innovation components: We begin by proving that in order to recover 
zc, zi, and Z2 via the 7- weighted ^i-norm formulation it is necessary that zi can be recovered via 
single-signal £i-norm minimization using $1 and measurements yi = ^izi. 

Consider the single-signal £i-norm minimization problem 

zi = argmin ||zi||i s.t. yi = ^izi. 

Suppose that this ^i-norm minimization for zi fails; that is, there exists zi 7^ zi such that yi = 
and ||ii||i < H^iHi- Therefore, substituting zi instead of zi in the 7- weighted ^i-norm formulation 
(12) provides an alternate explanation for the measurements with a smaller or equal modified ii- 
norm penalty. Consequently, recovery of zi using (12) will fail and we will recover xi incorrectly. 
We conclude that the single-signal ^i-norm minimization of zi using $1 is necessary for successful 
recovery using the 7-weightcd ^i-norm formulation. A similar condition for £i-norm minimization 
of Z2 using $2 and measurements $2^2 can be proved in an analogous manner. 

Necessary condition on common component: We now prove that in order to recover zc, 

zi, and Z2 via the 7- weighted £i-norm formulation it is necessary that zq can be recovered via single- 
signal £i-norm minimization using the joint matrix [<I>^ ^2"]"^ measurements [^J ^2V^C- 

The proof is very similar to the previous proof for the innovation component zi . Consider the 
single-signal ^i-norm minimization 

zc = argmin ||2:c||i s.t. yc = [^f ^sJ^^c- 

Suppose that this £i-norm minimization for zc fails; that is, there exists zc 7^ zc such that yc = 
^I'j^zc and ||z(7||i < ll^^dli- Therefore, substituting instead of in the 7-weighted £i-norm 
formulation (12) provides an alternate explanation for the measurements with a smaller modified 
^i-norm penalty. Consequently, the recovery of zc using the 7-weighted ^i-norm formulation (12) 
will fail, and thus wc will recover xi and X2 incorrectly. We conclude that the single-signal ^i-norm 
minimization of zc using [^J ^2]^ necessary for successful recovery using the 7-weighted £1- 
norm formulation. □ 



G Proof of Theorem 6 

We construct measurement matrices $i and $2 that consist of two sets of rows. The first set of 
rows is identical in both and recovers the signal difference X\ — X2- The second set is different and 

recovers the signal average ^xi -|- ^X2. Let the submatrix formed by the identical rows for the signal 
difference be $0, and let the submatrices formed by unique rows for the signal average be ^a,i 
and ^A,2- Thus the measurement matrices $1 and $2 are of the following form: 



$1 



and $2 
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The submatrices ^d, ^a,17 ^-nd $^,2 contain i.i.d. Gaussian entries. Once the difference xi — X2 
and average ^xi + ^X2 have been recovered using the above technique, the computation of xi and 
X2 is straightforward. The measurement rate can be computed by considering both parts of the 
measurement matrices. 

Recovery of signal difference: The submatrix is used to recover the signal difference. 
By subtracting the product of with the signals xi and X2, we have 



In the original representation we have xi — X2 = zi — Z2 with sparsity rate 25/. But zi{n) — Z2{n) 
is nonzero only if zi{n) is nonzero or Z2{n) is nonzero. Therefore, the sparsity rate of xi — X2 is 
equal to the sum of the individual sparsities reduced by the sparsity rate of the overlap, and so we 
have S{Xi — X2) = 2Si — (Sj)^. Therefore, any measurement rate greater than c'(25/ — (Sj)'^) for 
each permits recovery of the length signal xi — X2- (As always, the probability of correct 
recovery approaches one as N increases.) 

Recovery of average: Once xi — X2 has been recovered, we have 



where $£)(xi — X2), ^a,i{xi — X2), and ^a,2{xi — X2) are easily computable because (xi — X2) 
has been recovered. The signal ^xi + ^X2 is of length N; its sparsity rate is equal to the sum of 
the individual sparsities Sc + 25/ reduced by the sparsity rate of the overlaps, and so we have 
Si^Xi + iX2) = Sc + 2Si - 2ScSi - (S'/)2 + Sc{Si)^. Therefore, any measurement rate greater 
than c'{Sc + 2Si - 2ScSi - (S/)^ + Sc{SiY) aggregated over the matrices $a,i, and $a,2 
enables recovery of ^xi + \x2- 

Computation of measurement rate: By considering the requirements on the individual 
measurement rates R\ and R2 must satisfy (13a). Combining the measurement rates required for 
^A,i and $A,2, the sum measurement rate satisfies (13b). We complete the proof by noting that 
c'(-) is continuous and that lim5_>o c'(S') = 0. Thus, as Si goes to zero, the limit of the sum 
measurement rate is c'(iS'). □ 

H Proof of Theorem 7 

We assume that ^ is an orthonormal matrix. Like $j itself, the matrix ^j^! also has i.i.d. AA(0, 1) 
entries, since * is orthonormal. For convenience, we assume * = In- The results presented can be 
easily extended to a more general orthonormal matrix by replacing $j with ■ 

Assume without loss of generality that = {1,2, . . . ,K} for convenience of notation. Thus, 

the correct estimates are n < K, and the incorrect estimates are n > K + 1. Now consider the 
statistic ill (14). This is the sample mean of J i.i.d. variables. The variables {Vj, <j>j,n)'^ are i.i.d. 



$£)Xi - ^DX2 = ^d{xi - X2). 



Xl - -z{xi- X2) = -Xl + -X2 =X2 + -{Xl - X2). 



At this stage, we know xi — X2, ^dxi, ^0X2, ^A,ixi, and ^a,2X2- We have 
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since each i/j = ^jXj, and and xj are i.i.d. Furthermore, these variables have a finite variance.^ 
Therefore, we invoke the Law of Large Numbers (LLN) to argue that which is a sample mean 
of {uj, (t)j,n)^, converges to E[{yj,<^j^n)'^] as J grows large. We now compute E[{yj,<pj^n)^] under two 
cases. In the first case, we consider n > K + 1 (we call this the "bad statistics case"), and in the 
second case, we consider n < K ("good statistics case"). 

Bad statistics: Consider one of the bad statistics by choosing n = K + 1 without loss of 
generality. We have 



E[{yj,(pj,K+iy' 



E 



E 



K 



n=l 
K 



.n=l 



+ E 



K 



K K 

^ ^ Xj{t)Xj{n){4)j^i,(f)j^K+l){(t)j^nAj,K+l) 



n=l 



K K 

+ 1^1^ ^[a;jW]^[a;j(n)]£;[((^j-,^(/)j-,if+i)(<^j>,0j,K+i) 

n=l ^=l,^7^n 

since the terms are independent. We also have E[xj{n)\ = E[xj{£)] = 0, and so 

K 

E[{yj,cf>j,K+if] = [^J-(^)'] ^ [('^^■."' '^i.^+i)'] 

n=l 
K 



(25) 



n=l 



To compute E [(0j,n, 4'j,K+i)^] , let (f)j^n be the column vector [ai, a2, ■ ■ ■ , clm]'^ , where each element 
in the vector is i.i.d. M{0, 1). Likewise, let (j)j^K+i be the column vector 62, ■ ■ ■ , ^m]"^ where the 
elements are i.i.d. J\f{0, 1). We have 

{(l>j,n,(t>j,K+lf = (ai6i + 0262 + ... + om&m)^ 

M M-1 M 



m=l 



m=l r=m+l 



'in [61], we evaluate the variance of {yj, 4>j,n)^ 



Ma'^(34:MK + 6K^ + 28M^ + 92M + ASK + 90 + 2M^ + 2MK^ + AM'^K), n e O 
2MKa'^{MK + ZK + + n ^ Q. 



For finite M, K and a, the above variance is finite 
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Taking the expected value, we have 



E 



- M 






+ 2E 


_m=l 





M-l M 

^ ^ ^ ^ O'm.O'rbmbr 
m=l r=m+l 

M M~l M 

m=l m=l r=(jr+l 

M M-l 

m=l m=l r=m+l 

(since the random variables are independent) 

M 

= X + (^^"^^^ ^ t'*^] = ^ [^^] = 1 and [am] = E [hm] = 0) 

m=l 

= M, 



and thus 



E[{^j^n,ct>j,K+l?] = M. 

Combining this result with (25), we find that 



(26) 



K 



E[{yj,(Pj,K+if] = ^a^M = MKa\ 



n=l 

Thus we have computed E[{yj, (pj^K+i)'^] and can conclude that as J grows large, the statistic ^k+i 
converges to 

E[{yj,^j,K+if]=MKa^. (27) 

Good statistics: Consider one of the good statistics, and without loss of generality choose 
n = 1. Then, we have 



E[{yj,<l>j,if] = E 



K 



Xj{i)\\4>j,l\\'^ + Yxj{n){(t)j,n,(t>3,l) 



= E 



n=2 



+ E 



K 



.n=2 



(all other cross terms have zero expectation) 



K 



= E [xj{lf] E +Y,E [xj{nf] E [(</.,■„, i)^] (by independence) 

n=2 

K 

= a^E +Y.a^E . (28) 

n=2 

Extending the result from (26), we can show that E{(f)j^ni(t>3,i)'^ = Using this result in (28), 

EUAj.i?] = a'EUj,ir + f2^'M. (29) 



n=2 
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To evaluate E [||0j,i||^] , let be the column vector [ci, C2, . . . , cm]'^ , where the elements of the 
vector are random 7V(0, 1). Define the random variable Z = = 'l2m=i^m- Note that (i) 

E' ' ' ^ 



E 



4 

II^.mII 



ij^m j = E [Z^] and (ii) Z is chi-squared distributed with M degrees of freedom. Thus, 
41 - [Z^] = M(M + 2). Using this result in (29), we have 



E[{yj,4>j,if] = a^M{M + 2) + iK -l)a^M 
= M{M + K + l)a'^. 

We have computed the variance of {yj-,4>j,i) and can conclude that as J grows large, the statistic 
^1 converges to 

E[{y3,<l>j,i?] = {M + K + l)Mu\ (30) 
Conclusion: Prom (27) and (30) we conclude that 

1- c T?\i A, \2i / (M + K + l)Ma2, n^n 

For any M > 1, these values are distinct — their ratio is • Therefore, as J increases we can 

distinguish between the two expected values of ^„ with overwhelming probability. □ 

I Proof of Theorem 8 

Our proof has two parts. First we argue that limj^oo^c = zc- Then we show that this implies 
vanishing probability of error in recovering each innovation Zj. 

Part 1: We can write our estimate as 

1^ I ^ ^ 1-^1 1 1 

j=l j=l •' J j=l ■' J m=l 



where 4'f„^ denotes the m-th row of that is, the m-th measurement vector for node j. Since 
the elements of each $j are Gaussians with variance (t|, the product {<pf„i)^'Pfm property 

E[i4>f,j'^<t^f,J = apM. 

It follows that 

E[{(l>f,j'^(t>f,mXj] = (r]E[xj] = a]E[zc + zj] = a]zc 

and, similarly, that 

r 



E 



3 j rn=l 



ZC- 



Thus, 'zc is a sample mean of J independent random variables with mean zc- From the law of 
large numbers, we conclude that limj_K3o zq — zq. 

Part 2: Consider recovery of the innovation Zj from the adjusted measurement vector yj = yj — 
^jZc- As a recovery scheme, we consider a combinatorial search over all ii'-sparse index sets drawn 
from {1,2, . . . , A''}. For each such index set fl', we compute the distance from y to the column 
span of $j,n', denoted by (i(y, colspan($j q/)), where ^j,Q' is the matrix obtained by sampling the 
columns Q,' from (This distance can be measured using the pseudoinverse of ^j,Q'-) 

For the correct index set fi, we know that d{yj , colspan{^j^Q)) — > as ./ ^ oo. For any other 
index set Q', we know from the proof of Theorem 1 that d(^-, colspan($j n^)) > 0. Let 
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C := rmn d(y5■,colspan($j_^'))• 
With probability one, ( > 0. Thus for sufficiently large J, we will have colspan($j^n)) < (/2, 
and so the correct index set Q can be correctly identified. Since limj-^ooSb = zc, the innovation 
estimates zj = zj for each j and for J large enough. □ 
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